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(57) Abstract 



The present invention provides isolated polypeptides useful in the treatment and prevention of malaria caused by Plasmodium 
falciparum or P. vivax. In particular, the polypeptides are derived from the binding domains of the proteins in the DBL family as well as 
the sialic acid binding protein (SABP) on P. falciparum merozoites. The polypeptides may also be derived from the Duffy antigen binding 
protein (DABP) on P. vivax merozoites. 
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BIHDING DOMAINS FROM PLASMODIUM ViVAX AMD 
PIASMODIUM FAlCfPARUM EtmrnnCYn BIWDING PROTEIMS 

BACKGROUND OF THE INVENTION 
Malaria infects 200 - 400 million people each year causing 1-2 million deaths, thus remaining one 
5 of the most important infectious diseases in the world. Approximately 25 percent of all deaths of children in rural 
Africa between the ages of one and four years are caused by malaria. Due to the importance of the disease as a 
worldwide health problem, considerable effort is being expended to identify and develop malaria vaccines. 

Malaria in humans is caused by four species of the parasite Plasmodium: P. falciparum, P. vivax, 
P. knowlesi and P malariae. The major cause of malaria in humans is P. falciparum which infects 200 million to 
10 400 million people every year, killing 1 to 4 million. 

Duffy Antigen Binding Protein IDABP) and Sialic Acid Binding Protein (SABP) are soluble proteins 
that appear in the culture supernatant after infected erythrocytes release merozoites. Immunochemical data indicate 
that DABP and SABP which are the respective ligands for the P. yivax and P. falciparum Duffy and sialic acid 
receptors on erythrocytes, possess specificities of binding which are identical either in soluble or membrane bound 
15 form. 

DABP is a 135 kDa protein which binds specifically to Duffy blood group determinants (Wertheimer 
et aL Exp. Parasitol. 69: 340-350 (1989); Barnwell, et al. J. Exp. Med. 169: 1795.1802 (1989)). Thus, binding 
of DABP is specific to human Duffy positive erythrocytes. There are four major Duffy phenotypes for human 
erythrocytes: Fy(a), Fy(b), Fy(ab) and Fy(negative), as defined by the anti-Fy^ and anti-Fy*' sera (Hadiey et ai. In Red 
20 Cell Antigens and Antibodies, G. Garratty, ed. (Arlington, Va.:American Association of Blood Banks) pp. 17-33 (1986)). 
DABP binds equally to both Fy(a) and Fy(b) erythrocytes which are equally susceptible to invasion by P. vivar, but 
not to Fy(negative) erythrocytes. 

In the case of SABP, a 175kDa protein, binding is specific to the glycophorin sialic acid residues 
on erythrocytes (Camus and Hadiey, Science 230:553-556 (1985); Oriandi, et ai, J, Cell Biol. 11 6:901-909 11992)). 
25 Thus, neuraminidase treatment (which cleaves off sialic acid residues) render erythrocytes immune to P. falciparui, 
invasion. 

The specificities of binding and correlation to invasion by the parasite thus indicate that DABP 
and SABP are the proteins of P. vivax and P. falciparum which interact with sialic acids and the Duffy antigen on 
the erythrocyte. The genes encoding both proteins have been cloned and .the DNA and predicted protein sequences 

30 have been determined (B. Kim Lee Sim, et aL, J, Cell BioL 1 11: 1877-1884 (1990); Fang, X.. et aL, MoL Biochem 
Parasitol 44: 125-132 (1991)). 

Despite considerable research ettons wonawiae, because of the complexity of tne Piasmodium 
parasite and its interaction with its host, it has not been possible to discover a satisfactory solution for prevention 
or abatement of the blood stage of malaria. Because malaria is a such a large worldwide health problem, there is 

35 a need for methods that abate the impact of this disease. The present invention provides effective preventive and 
therapeutic measures against Plasmodium invasion. 
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SUMMARY OF THE INVENTION 
The present invention provides compositions comprising an isolated DABP binding domain 
polypeptides and/or isolated SABP binding domain polypeptides. The DABP binding domain polypeptides -preferably 
comprise between about 200 and about 300 amino acid residues while the SABP binding domain polypeptides 
preferably comprises between about 200 and about 600 amino acid residues. A preferred DABP binding domain 
polypeptide has about 325 residues of the amino acid sequence found in SEQ ID N0:2. A preferred SABP binding 
domain polypeptide has about 616 residues of the amino acid sequence of SEQ ID N0:4, encoded by the DNA 
sequence of SEQ ID NO: 3. The preferred DABP binding domain and SABP binding domain include the cysteine rich 
portions of the proteins shown in Figure 1. 

The present invention also includes pharmaceutical compositions comprising a pharmaceutically 
acceptable carrier and an isolated DABP binding domain polypeptide in an amount sufficient to induce a protective 
immune response to Plasmodium vivax merozoites in an organism. In addition, isolated SABP binding domain 
polypeptide in an amount sufficient to induce a protective immune response to Plasmodium falciparum may be added 
to the pharmaceutical composition. 

Also provided are pharmaceutical compositions comprising a pharmaceutically acceptable carrier 
and an isolated SABP binding domain polypeptide in an amount sufficient to induce a protective immune response 
to Plasmodium falciparum merozoites in an organism. In addition, isolated DABP binding domain polypeptide in an 
anriount sufficient to induce a protective immune response to Plasmodium vivax may be added to the pharmaceutical 
composition. 

Isolated polynucleotides which encode a DABP binding domain polypeptides or SABP binding domain 
polypeptides are also disclosed. In addition, the present invention includes a recombinant cell comprising the 
polynucleotide encoding the DABP binding domain polypeptide. 

The current invention further includes methods of inducing a protective immune response to 
Plasmodium merozoites in a patient. The methods comprise administering to the patient an immunologically effective 
25 amount of a pharmaceutical composition comprising a pharmaceutically acceptable carrier and an isolated DABP 
binding domain polypeptide, an SABP binding domain polypeptide or a combination thereof. 

The present disclosure also provides DNA sequences from additional P. falciparum genes in the 
Duffy-binding like (DBL) family that have regions conserved with the P falciparum 175 kD and P vivax 135 kO 
binding proteins. 

30 

DEFINITIONS 

As used herein a "DABP binding domain polypeptide" or a "SABP binding domain polypeptide" are 
polypeptides substantially identical (as defined below) to a sequence from the cysteine rich, amino-terminal region of 
the Duffy antigen binding protein (DABP) or sialic acid binding protein (SABP), respectively. Such polypeptides are 
35 capable of binding either the Duffy antigen or sialic acid residues on glycophorin. In particular, DABP binding domain 
polypeptides consist of amino acid residues substantially similar to a sequence of SABP within a binding domain 



20 
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containing the cysteine-rich sequence shown in Figure 1. SABP binding domain polypeptides consist of residues 
substantially similar to a sequence of DABP within a binding domain containing the cysteine-rich sequence shown 
in Figure 1. 

The binding domain polypeptides encoded by the genes of the DBL family consist of those residues 
5 substantially identical to the sequence of the binding domains of DABP and SABP as defined above. The DBL family 
comprises sequences with substantial similarity to the conserved regions of the DABP and SABP. These include 
those sequences reported here as ebM (SEQ ID N0:5 and SEQ ID N0:6), E31a ISEQ ID N0:7 and SEQ ID N0:8), var- 
7 (SEQ. ID. N0:13 and SEQ. ID. NG:14, GenBank Accession No. L42636) and var l (SEQ. ID. N0:15 and SEQ ID 
NO: 16, GenBank Accession No. L40B08). The sequence ebl-2, (SEQ ID N0:9 and SEQ ID NO: 10) represents the 
10 binding domains of van?, and Proj3 (SEQ ID N0:11 and SEQ ID N0:12) is the binding domain of var L The DBL 
family also includes two other members var-2 and var-S (GenBank Accession No. L40609). 

The polypeptides of the invention can consist of the full length binding domain or a fragment 
thereof. Typically DABP binding domain polypeptides will consist of from about 50 to about 325 residues, preferabfy 
between about 75 and 300, more preferably between about 100 and about 250 residues. SABP binding domain 
15 polypeptides will consist of from about 50 to about 616 residues, preferably between about 75 and 300, more 
preferably between about 100 and about 250 residues. 

Particularly preferred polypeptides of the invention are those within the binding domain that are 
conserved between SABP and the DBL family. Residues within these conserved domains are shown in Figure 1, 
below. 

20 Two polynucleotides or polypeptides are said to be "identical" if the sequence of nucleotides or 

amino acid residues in the two sequences is the same when aligned for maximum correspondence. Optimal alignment 
of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Adv, Appl. 
Math, 2: 482 (1981), by the homology alignment algorithm of Needleman and Wunsch J. MoL Biol. 48:443 (1970!, 
by the search for similarity method of Pearson and Lipman Proc. Natl. Acad. ScL (U.S.A.) 85: 2444 (1988), by 

25 computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics 
Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wl), or by inspection. The 
term "substantial identity" means that a polypeptide comprises a sequence that has at least 80% sequence identity, 
preferably 90%, more preferably 95% or more, compared to a reference sequence over a comparison window of 
about 20 residues to about 600 residues- typically about 50 to about 500 residues usually about 250 to 300 

30 residues. The values of percent identity are determined using the programs above. Particularly preferred peptides 
of tha prsssr.t inVCMticn ccn::pricc z ccq'jcp.cc in which at least 70% of the cysteine residue? conserved jn RARP and 
SABP are present. Additionally, the peptide will comprise a sequence in which at least 50% of the tryptophan 
residues conserved in DABP and SABP are present. The term substantial similarity is also specifically defined here 
with respect to those amino acid residues found to be conserved between DABP, SABP and the sequences of the 

35 DBL family. These conserved amino acids consist prominently of tryptophan and cysteine residues conserved among 
all sequences reported here. In addition the conserved amino acid residues include phenylalanine residues which may 
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be substituted with tyrosine. These amino acid residues may be determined to be conserved after the sequences 
have been aligned using methods outlined above by someone skilled in the art. 

Another indication that polypeptide sequences are substantially identical is if one protein is 
immunologically reactive with antibodies raised against the other protein. Thus, the polypeptides of the invention 
5 include polypeptides immunologically reactive with antibodies raised against the SABP binding domain, the DABP 
binding domain or raised against the conserved regions of the DBL family. 

Another indication that nucleotide sequences are substantially identical is if two molecules hybridize 
to each other under stringent conditions. Stringent conditions are sequence dependent and will be different in 
different circumstances. Generally, stringent conditions are selected to be about S'' C lower than the thermal melting 
10 point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined 
ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Typically, 
stringent conditions will be those in which the salt concentration is about 0.02 molar at pH 7 and the temperature 
is at least about 60''C. 

Nucleotide sequences are also substaintially identical for purposes of this application when the 
15 polypeptides which they encode are substantially identical. Thus, where one nucleic acid sequence encodes 
essentially the same polypeptide as a second nucleic acid sequence, the two nucleic acid sequences are substantially 
identical, even if they would not hybridize under stringent conditions due to silent substitutions permitted by the 
genetic code (see, Darnell et ai (1990) Molecular Cell Biology, Second Edition Scientific American Books, W.H. 
Freeman and Company, New York, NY, for an explanation of codon degeneracy and the genetic code). 
20 The phrases "isolated" or "biologically pure" refer to material which is substantially or essentially 

free from components which normally accompany it as found in its native state. Thus, the binding domain 
polypeptides of this invention do not contain materials normally associated with their in situ environment, e.g., other 
proteins from a merozoite membrane. Typically, isolated proteins of the invention are at least about 80% pure, 
usually at least about 90%, and preferrably at least about 95% as measured by band intensity on a silver stained 
25 gel. 

Protein purity or homogeneity may be indicated by a number of means well known in the art, such 
as potyacrylamide gel electrophoresis of a protein sample, followed by visualization upon staining. For certain 
purposes high resolution will be needed and HPLC or a similar means for purification utilized. 

The term "residue" refers to an amino acid (D or L) or amino acid mimetic incorporated in a 
30 oligopeptide by an amide bond or amide bond mimetic. An amide bond mimetic of the invention includes peptide 
backbone modifications well known to those skilled in the art. 

BRIEF DESCRIPTION OF THE DRAWINGS 
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Figure 1 represents an alignment of the predicted amino acid sequences of the DABP binding 
domain (Vivax) (SEQ 10 N0:25), the two homologous SABP domains (SABP F1 (SEQ ID N0.:26) and SABP F2 (SEQ 
ID N0:27)) and the sequenced members of the Z75Z gene family (ebi-l |SEQ ID N0:28l E31a (SEQ ID N0:29l tBL-2 
(SEQ ID N0:30)) and the three homologous Proj3 domains (F1 (SEQ ID N0:31l F2 (SEQ ID N0:32) and F3 (SEQ ID 
5 l\IO:33)). 

Figure 2 represents a schematic of the pRE4 cloning vector. 

Figure 3 shows primers useful for isolating sequences encoding the conserved motifs of the 
invention. Primers UNIEBPB (SEQ ID N0:35) and UNIEBP5A (SEQ ID N0:36) encode the amino acid sequence of SEQ 
ID N0:34; primers UNIEBP5B (SEQ ID N0:38) and UNIEBP5C (SEQ ID N0:39) encode the amino acid sequence of 

10 SEQ ID N0:37; primers UNIEBP3 (SEQ ID N0:41) and UNIEBP3A (SEQ ID N0:42) encode the amino acid sequence 
of SEQ ID N0:40; and primers UNIEBP3B (SEQ ID N0:44) and UNIEBP3C (SEQ ID N0:45) encode the amino acid 
sequence of SEQ ID N0:43. 

Figure 4 shows the relative position of the E31a ORF on chromosome 7. 

Figure 5 shows a map of a var gene cluster on chromosome 7. Relative positions of four YACs 

15 (PfYEF2, PfYFEB, PfYKFB, PfYED9) are indicated under the chromosome 7 line at the top of the figure. YACs PfYFEB 
and PfYKFB lie entirely within a segment linked to CQR in a genetic cross, whereas YACs PfYED9 and PfYEF2 extend 
beyond sites (identified by pE53a and pH270.5) that are dissociated from the chloroquine response. The var cluster 
extends over a region of 100-150 kb in PfYED9. Exons and introns of the var-l var-2 and var-S genes within the 
sequenced 40 kb segment are represented by solid and dotted lines, respectively; arrows show the coding direction. 

20 Two more var elements outside of the sequenced region, identified by conserved restriction sites and cross* 
hybridization, are indicated by dashed-lines (var2c and var-3cY Bold letters mark repeated restriction sites that 
suggest a duplication in the var ^/var-S and var-lclvar-dc segments. Enzyme recognition sites: A, Apa\\ B, Bgh\ C, 
Cla\\ D, ////7dlll; E, Hae\\\\ H, tfwHIl; K, Kpn\\ M, fe/77HI; P, Hpa\\ S, SntaV H/nM and HaelW sites outside of the 
sequenced region were not mapped. Positions and sizes of inserts from the Dd2 subsegment library are indicated: 

25 a, pE280b; b, pB20.3; c, pBBOO; d, pE21b; e, pB20.24; f, pE32b; h, pE241a; i, pE240a/51d; j, pE33a; k, pB20.23; 
I, y1L17BA6; m, pB20.26; n, pB20SU.27; 0, p15J2J3. Inserts from the PfYED9 34 kb Apal-Smal fragment library: 
r, pB3; s, p3G11; t, pJVs; u, p2E10; v, pIG3; w, p2E3; x, p2B6; y, PEIO; z, pJVr; a, pC5; fi, pi A3; p1F6; 6. 
p3C3; 6, pA2; C p2A9; rj, p3C4; pJZn; /c, p3D8. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

30 The binding of merozoites and schizonts to erythrocytes is mediated by specific binding proteins 

on iim suriduu of iim iiieiu^uil^ Oi selnzorii afm U ficCcSsary for erytmGcyta invasion. In ths cggg cf P. fah':pzrt:rr„ 
this binding involves specific Interaction between sialic acid glycophorin residues on the erythrocyte and the sialic 
acid binding protein (SABP) on the surface of the merozoite or schizont. The ability of purified SABP to bind 
erythrocytes with chemically or enzymatically altered sialic acid residues paralleled the ability of P. falciparum to 

35 invade these erythrocytes. Furthermore, sialic acid deficient erythrocytes neither bind SABP nor support invasion 
by P, falciparum. The DNA encoding SABP from P, falciparum has also been cloned and sequenced. 
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In P. vivax, specific binding to the erythrocytes involves interaction between the Duffy blood group 
antigen on the erythrocyte and the Duffy antigen binding protein (DABP) on the merozoite. Duffy binding proteins 
were defined biologically as those soluble proteins that appear in the culture supernatant after the infected 
erythrocytes release merozoites which bind to human Duffy positive, but not to human Duffy negative erythrocytes. 
5 It has been shown that binding of the P. max DABP protein to Duffy positive erythrocytes is blocked by antisera 
to the Duffy blood group determinants. Purified Duffy blood group antigens also block the binding to erythrocytes. 
DABP has also been shown to bind Duffy blood group determinants on Western blots. 

Duffy positive blood group determinants on human erythrocytes are essential for invasion of human 
erythrocytes by Plasmodium vivax. Both attachment and reorientation of P. vivax merozoites occur equally well on 

10 Duffy positive and negative erythrocytes. A junction then forms between the apical end of the merozoite and the 
Duffy-positive erythrocyte, followed by vacuole formation and entry of the merozoite into the vacuole. Junction 
formation and merozoite entry into the erythrocyte do not occur on Duffy negative cells, suggesting that the receptor 
specific for the Duffy determinant is involved in apical junction formation but not initial attachment. The DNA 
sequences encoding the DABP from P. vivax and P. knowlesi have been cloned and sequenced. 

15 P. vivax red celt invasion has an absolute requirement for the Duffy blood group antigen. Isolates 

of P, falciparum, however, vary in their dependency on sialic acid for invasion. Certain P. falciparum clones have 
been developed which invade sialic acid deficient erythrocytes at normal rates. This suggests that certain strains 
of P, falciparum can interact with other ligands on the erythrocyte and so may possess multiple erythrocyte binding 
proteins with differirig specificities. 

20 A basis for the present invention is the discovery of the binding domains in both DABP and SABP. 

Comparison of the predicted protein sequences of DABP and SABP reveals an amino terminal, cysteine rich region 
in both proteins with a high degree of similarity between the two proteins. The amino-terminal, cysteine-rich region 
of DABP contains about 325 amino acids, whereas the amino terminal, cysteine-rich region of SABP contains about 
616 amino acids. This is due to an apparent duplication of the amino terminal, cysteine-rich region in the SABP 

25 protein. The cysteine residues are conserved between the two regions of SABP and DABP, as are the amino acids 
surrounding the cysteine residues and a number of aromatic amino acid residues in this region. The amino terminal 
cysteine rich region and another cysteine-rich region near the carboxyl-terminus show the most similarity between 
the DABP and SABP proteins. The region of the amino acid sequence between these two cysteine-rich regions show 
only limited similarity between DABP and SABP. 

30 Other P. falciparum open reading frames and genes with regions that have substantial identity to 

binding domains of SABP and DABP have been identified. Multiple copies of these sequences exist in the parasite 
genome, indicating their important activity in host-parasite interactions. A family of these sequences (the DBL family) 
have been cloned from chromosome 7 subsegment libraries that were constructed during genetic studies of the 
chloroquine resistance locus (Wellems et. aL, PfllAS 88: 3382-3386 11991)). Certain of these transcripts are known 

35 to be from the var family of genes that modulate cytoadherence and antigenic variation of P. falciparum- infected 
erythrocytes {see, Example 3, below). 
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Genes of the P. falciparum i^ar family encode 200-350 kD variant surface molecules that determine 
antigenic and adhesive properties of parasitized erythrocytes. The large repertoire of }far genes (50-150 copies, 
having sufficient DNA to account for 2-6% of the haploid genome), the dramatic sequence variation among-the gene 
copies, their variable expression in different parasite lines, the ready detection of DNA rearrangements, and the 
5 receptor binding features of the encoded extracellular domains all implicate var genes as the major determinants of 
antigenic variation and cytoadherence in P, falciparum malaria. 

A second class of Z7^Z -encoding transcripts includes single-copy genes such as ebl-h Genetic 
linkage studies have placed this gene within a region of chromosome 13 that affects invasion of malarial parasites 
in human red blood cells (Wellems et ai, Cf// 49:633-642 (1987)). Both SABP and ebl-1 show restriction patterns 
10 that are well conserved among different parasite isolates. This conservation of gene structure and the sequence 
relationships between the ebl l and SABP domains suggest that ebl-1 encodes a novel erythrocyte binding molecule 
having receptor properties distinct from those of SABP. 

Southern hybridization experiments using probes from these open reading frames have indicated 
that additional copies of these conserved sequences are located elsewhere in the genome. The largest of the open 
15 reading frames on chromosome 7 is 8 kilobases and contains four tandem repeats homologous to the N terminat, 
cysteine-rich unit of SABP and DABP. 

Figure 1 represents an alignment of the DBL family with the DABP binding domain and two 
homologous regions of SABP (F^ and Fj). The DBL family is divided into two sub-families to achieve optimal 
alignment. Conserved cysteine residues are shown in bold face and conserved aromatic residues are underlined. 
20 The polypeptides of the invention can be used to raise monoclonal antibodies specific for the 

binding domains of SABP, DABP or the conserved regions in the Z75Z gene family. The antibodies can be used for 
diagnosis of malarial infection or as therapeutic agents to inhibit binding of merozoites to erythrocytes. The 
production of monoclonal antibodies against a desired antigen is well known to those of skill in the art and is not 
reviewed in detail here. 

25 The multitude of techniques available to those skilled in the art for production and manipulation 

of various immunoglobulin molecules can thus be readily applied to inhibit binding. As used herein, the terms 
"immunoglobulin" and "antibody" refer to a protein consisting of one or more polypeptides substantially encoded by 
immunoglobulin genes. Immunoglobulins may exist in a variety of forms besides antibodies, including for example, 
Fv, Fab, and F(ab)2, as well as in single chains. For a general review of immunoglobulin structure and function see, 

30 Fundamental Immunology, 2d Ed., W.E. Paul ed.. Ravens Press, N.Y., (1989). 

Antibodies which bind polypeptides of the invention may be produced by a variety of means. The 
production of non-human monoclonal antibodies, e.g., murine, lagomorpha, equine, etc., is well known and may be 
accomplished by, for example, immunizing the animal with a preparation containing the polypeptide. 
Antibody-producing cells obtained from the immunized animals are immortalized and screened, or screened first for 

35 the production of antibody which inhibits binding between and meroxoites and erythrocytes and then immortalized. 
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For a discussion of general procedures of monoclonal antibody production see Harlow and Lane, Antibodies. A 
Laboratory Manual Zo\i Spring Harbor Publications, N.Y. (1988). 

Thus, the present invention allows targeting of protective immune responses or monoclonal 
antibodies to sequences in the binding domains that are conserved between SABP. DABP and encoded regions of the 
) DBL family. Identification of the binding regions of these proteins facilitates vaccine development because it allows 
for a focus of effort upon the functional elements of the large molecules. The particular sequences within the 
binding regions refine the target to critical regions that have been conserved during evolution, and are thus preferred 
for use as vaccines against the parasite. 

The genes of the DBL family (which have not previously been sequenced) can be used as markers 
to detect the presence of the P. falciparum parasite in patients. This can be accomplished by means well known 
to practitioners in the art using tissue or blood from symptomatic patients in PCR reactions with oligonucleotides 
complementary to portions of the genes of the DBL family. Furthermore, sequencing the DBL family provides a 
means for skilled practitioners to generate defined probes to be used as genetic markers in a variety of applications. 

Additionally, the present invention defines a conserved motif present in, but not restricted to other 
members of the subphylum Apicomplexa which participates in host parasite interaction. This motif can be identified 
in Plasmodium species and other parasitic protozoa by the polymerase chain reaction using the synthetic 
oligonucleotide primers shown in Figure 3. PCR methods are described in detail below. These primers are designed 
from regions in the conserved motif showing the highest degree of conservation among DABP, SABP and the DBL 
family. Figure 3 shows these regions and the consensus amino acid sequences derived from them. 
20 A. General Methods 

Much of the nomenclature and general laboratory procedures required in this application can be 
found in Sambrook, et al.. Molecular Cloning A Laboratory Manual, 2nd Ed., Vol. 1-3, Cold Spring Harbor Laboratory, 
Cold Spring Harbor, NY, 1989. The manual is hereinafter referred to as "Sambrook, et al.. 1989." 

The practice of this invention involves the construction of recombinant nucleic acids and the 
25 expression of genes in transfected cells. Molecular cloning techniques to achieve these ends are known in the art. 
A wide variety of cloning and in vitro amplification methods suitable for the construction of recombinant nucleic acids 
are well-known to persons of skill. Examples of these techniques and instructions sufficient to direct persons of skill 
through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques. Methods 
in Emymology volume 152 Academic Press. Inc., San Diego, CA (Berger); and Current Protocols in Molecular Biology. 
F.M. Ausubel et al.. eds.. Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John 
Wiley & Sons, Inc., (1994 Supplement) (Ausubel). 

Examples of techniques sufficient to direct persons of skill through//? vitro amplification methods, 
including the polymerase chain reaction (PCR) the ligase chain reaction (LCR), Off replicase amplification and other 
RNA polymerase mediated techniques are found in Berger, Sambrook et aL, 1989, and Ausubel, as well as Mullis 
35 et al.. (1 987) U.S. Patent No. 4,883,202; PCB Protocols A Guide to Methods and Applications (Innis et aL eds). 
Academic Press Inc.. San Diego, CA, 1990) ("Innis"); Arnheim & Levinson (October 1, 1990) C&EN 36-47; The 
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Journal Of NIH Research (1991) 3, 81-94; Kwoh et aL (1989) Proc. Natl. Acad, Sol. USA 86, 1 173; Guatelli et al. 
11990) Proc. NatL Acad. ScL USA 87, 1874; Lomell et aL (1989) J, Clin. Chem 35, 1826; Landegren et al., (1988) 
Science 241, 1077-1080; Van Brunt (1990) Biotechnology 291-294; Wu and Wallace, (1989) Gene 4, 560; and 
Barringer et aL (1990) Gene 89, 117. Improved methods of cloning in vitro amplified nucleic acids are described 
5 in Wallace et aL U.S. Pat. No. 5,426,039. 

The culture of cells used in the present invention, including cell lines and cultured cells from tissue 
or blood samples is well known in the art. Freshney (Culture of Animal Cells, a Manual of Basic Technique, third 
ed, Wiley-Liss, New York, NY (1994)) and the references cited therein provides a general guide to the culture of 
cells. 

10 OBL genes are optionally bound by antibodies in one of the embodiments of the present invention. 

Methods of producing polyclonal and monoclonal antibodies are known to those of skill in the art. See, e.g., Coligan 
\\%%\\ Current Protocols in Immunology Wiley/Greene, NY; and Harlow and Lane (1989) Antibodies: A Laboratory 
Manual Cold Spring Harbor Press, NY; Stites et aL (eds.) Basic and Clinical Immunology (4th ed.) Lange Medical 
Publications, Los Altos, CA, and references cited therein; Goding (1986) Monoclonal Antibodies: Principles and 

15 Practice (2d ed.) Academic Press, New York, NY; and Kohler and Milstein (1975) Nature 256: 495-497. Other 
suitable techniques for antibody preparation include selection of libraries of recombinant antibodies in phage or similar 
vectors. See, Huse et aL (1989) Science 246: 1275 1281; and Ward, et aL (1989) Nature 341: 544-546. Specific 
Monoclonal and polyclonal antibodies will usually bind with a KD of at least about .1 mM, more usually at least 
about 1 //M, and most preferably at least about .1 yuM or better. 

20 B. Methods for isolating DNA encodino SABP, DABP and DBL binding regions 

The nucleic acid compositions of this invention, whether RNA, cDN A, genomic DNA, or a hybrid 
of the various combinations, may be isolated from natural sources or may be synthesized in vitro. The nucleic acids 
claimed may be present in transformed or transfected whole cells, in a transformed or transfected cell lysate, or in 
a partially purified or substantially pure form. 

25 Techniques for nucleic acid manipulation of genes encoding the binding domains of the invention, 

such as subcloning nucleic acid sequences encoding polypeptides into expression vectors, labelling probes, DNA 
hybridization, and the like are described generally in Sambrook et aL, 1989. 

Recombinant DNA techniques can be used to produce the binding domain polypeptides. In general, 
the DNA encoding the SABP and DABP binding domains are first cloned or isolated in a form suitable for ligation 

30 into an expression vector. After ligation, the vectors containing the DNA fragments or inserts are introduced into 
2 s'jitabie host ce!! for expression of the recombinant blndLnn rinmains. THr nnlyneptldes are then isolated from the 
host cells. 

There are various methods of isolating the DNA sequences encoding the SABP, DABP and DBL 
binding domains. Typically, the DNA is isolated from a genomic or cDNA library using labelled oligonucleotide probes 
35 specific for sequences in the DNA. Restriction endonuclease digestion of genomic DNA or cDNA containing the 
appropriate genes can be used to isolate the DNA encoding the binding domains of these proteins. Since the DNA 
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sequences of the SABP and DABP genes are known, a panel of restriction endonucleases can be constructed to give 
cleavage of the DNA in the desired regions. After restriction endonuclease digestion, DNA encoding SABP binding 
domain or DABP binding domain is identified by its ability to hybridize with nucleic acid probes, for example on 
Southern blots, and these DNA regions are isolated by standard methods familiar to those of skill in the art. See 
Sambrook, et a/.. 1989. 

The polymerase chain reaction can also be used to prepare DABP, SABP DBL binding domain DNA. 
Polymerase chain reaction technology (PGR) is used to amplify nucleic acid sequences of the DABP and SABP binding 
domains directly from mRNA. from cDNA, and from genomic libraries or cDNA libraries. The primers shown in Figure 
3 are particularly preferred for this process. 

Appropriate primers and probes for amplifying the SABP and DABP binding region DNA's are 
generated from analysis of the DNA sequences. In brief, oligonucleotide primers complementary to the two 3' borders 
of the DNA region to be amplified are synthesized. The polymerase chain reaction is then carried out using the two 
primers. See PCR Protocols: A Guide to Methods and Applications. (Innis, M, Gelfand, D., Sninsky. J. and White, 
T.. (eds.). Academic Press, San Diego, CA (1990). Primers can be selected to amplify the entire DABP regions or 
15 to amplify smaller segments of the DABP and SABP binding domains, as desired. 

Oligonucleotides for use as probes are chemically synthesized according to the solid phase 
phosphoramidite triester method first described by Beaucage, S.L. and Caruthers, M.H., 1981, Tetrahedron Letts., 
22(20):1 859-1 862 using an automated synthesizer, as described in Needham-VanDevanter. D.R., et aL 1 984, Nucleic 
Acids Res., 12:6159-6168. Purification of oligonucleotides is by either native acrylamide gel electrophoresis or by 
anion exchange HPLC as described in Pearson, J.D. and Regnier, F.E., 1983, J. Chrom., 255:137-149. 

The sequence of the synthetic oligonucleotides can be verified using the chemical degradation 
method of Maxam, A.M. and Gilbert, 1980, in W., Grossman, L. and Moldave, D., eds. Academic Press, New York, 
NY, Methods in Enzymology 65:499-560. 

Other methods known to those of skill in the art may also be used to isolate DNA encoding all 
25 or part of the SABP or DABP binding domains. See Sambrook, et al.. 1989. 

Expression of DABP. SABP and DBL BIndinn Domain PolynRntiriPi! 
Once binding domain DNAs are isolated and cloned, one may express the desired polypeptides in 
a recombinantly engineered cell such as bacteria, yeast, insect (especially employing baculoviral vectors), and 
mammalian cells. It is expected that those of skill in the art are knowledgeable in the numerous expression systems 
available for expression of the DNA encoding the DABP and SABP binding domains. No attempt to describe in detail 
the various methods known for the expression of proteins in prokaryotes or eukaryotes will be made. 

In brief summary, the expression of natural or synthetic nucleic acids encoding binding domains 
will typically be achieved by operably linking the DNA or cDNA to a promoter (which is either constitutive or 
inducible), followed by incorporation into an expression vector. The vectors can be suitable for replication and 
35 integration in either prokaryotes or eukaryotes. Typical expression vectors contain transcription and translation 
terminators, initiation sequences, and promoters useful for regulation of the expression of the DNA encoding the 
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binding domains. To obtain high level expression of a cloned gene, it is desirable to construct expression plasmids 
which contain, at the minimum, a strong promoter to direct transcription, a ribosome binding site for translational 
initiation, and a transcription/translation terminator. 

1. Expression in Prokarvotes 

5 Examples of regulatory regions suitable for this purpose in £ coli are the promoter and operator 

region of the £ coli tryptophan biosynthetic pathway as described by Yanofsky, C, 1984, J. BacterioL, 
158:1018-1024 and the leftward promoter of phage lambda [?^) as described by Herskowitz, I. and Hagen, D., 1980, 
Ann. Rev. Genet., 14:399445. The inclusion of selection markers in DNA vectors transformed in £ coli h also 
useful. Examples of such markers include genes specifying resistance to ampicillin, tetracycline, or chloramphenicol. 
10 See Sambrook et ai, 1989, for details concerning selection markers for use in £ coli 

The vector is selected to allow introduction into the appropriate host cell. Bacterial vectors are 
typically of plasmid or phage origin. Appropriate bacterial cells are infected with phage vector particles or 
transfected with naked phage vector DNA. If a plasmid vector is used, the bacterial cells are transf acted with the 
plasmid vector DNA. 

1 5 Expression systems for expressing the DABP and SABP binding domains are available using £ coli. 

Bacillus sp. (Palva, I et 1983, Gene 22:229-235; Mosbach, K. et aL Nature, 302:543-545 and Salmonella, £ 
coli systems are preferred. 

The binding domain polypeptides produced by prokaryote cells may not necessarily fold properly. 
During purification from £ coli, the expressed polypeptides may first be denatured and then renatured. This can be 
20 accomplished by solubilizing the bacterially produced proteins in a chaotropic agent such as guanidine HCI and 
reducing all the cysteine residues with a reducing agent such as beta mercaptoethanol. The polypeptides' are then 
renatured, either by slow dialysis or by gel filtration. U.S. Patent No. 4,511,503. 

Detection of the expressed antigen is achieved by methods known in the art as radioimmunoassays. 
Western blotting techniques or immunoprecipitation. Purification from £ coli can be achieved following procedures 
25 described in U.S. Patent No. 4,511,503. 

2. Synthesis of SABP. DABP and DBL Binding Domains in Eukarvotes 

A variety of eukaryotic expression systems such as yeast, insect cell lines and mammalian cells, 
are known to those of skill in the art. As explained briefly below, the DABP and SABP binding domains may also 
be expressed in these eukaryotic systems. 
30 a. Expression in Yeast 

Synthesis of heterologous proteins in yeast is well Known and described. Metiioos in Yeast 
Genetics, Sherman, F., et aL, Cold Spring Harbor Laboratory, (1982) is a well recognized work describing the various 
methods available to produce the binding domains in yeast. 

Examples of promoters for use in yeast include GAL1,10 (Johnson, M., and Davies, R.W., 1984, 
35 Mol. and Cell. Biol., 4:1440-1448) ADH2 (Russell, D., et aL 1983, J. Biol. Chem., 258:2674-2682), PH05 (EMBO 
J. 6:675-680, 1982), and MFol (Herskowitz, I. and Oshima, Y., 1982, in The Molecular Biology of the Yeast 
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Saccharomyces, (eds. Strathern, J.N. Jones, E.W., and Broach, J.R., Cold Spring Harbor Lab., Cold Spring Harbor, 
N.Y., pp. 181-209. A multicopy plasmid with a selective marker such as Leu-2, URA-3, Trp-1, and His-3 is also 
desirable. 

A number of yeast expression plasmids like YEp6, YEpl3, YEp4 can be used as vectors. A gene 
5 of interest can be fused to any of the promoters in various yeast vectors. The above-mentioned plasmids have been 
fully described in the literature (Botstein, et aL 1979, Gene, 8:17-24; Broach, et ai. 1979, Gene, 8:121-133). 

Two procedures are used in transforming yeast cells. In one case, yeast cells are first converted 
into protoplasts using zymolyase, lyticase or glusulase, followed by addition of DNA and polyethylene glycol (PEG). 
The PEG-treated protoplasts are then regenerated in a 3% agar medium under selective conditions. Details of this 
10 procedure are given in the papers by J.D. Beggs, 1978, Nature (London), 275:104-109; and Hinnen, A., et al, 1978, 
Proc. NatL Acad. ScL USA, 75:1929-1933. The second procedure does not involve removal of the cell walL Instead 
the cells are treated with lithium chloride or acetate and PEG and put on selective plates (Ito, H., et aL, 1983, J. 
Bact., 153:183-168). 

The binding domains can be isolated from yeast by lysing the cells and applying standard protein 

15 isolation techniques to the lysates. The monitoring of the purification process can be accomplished by using Western 
blot techniques or radioimmunoassays of other standard immunoassay techniques. 

b. Expression in Mammalian and Insect Cell Cultures 
Illustrative of cell cultures useful for the production of the binding domains are cells of insect or 
mammalian origin. Mammalian ceil systems often will be in the form of monolayers of cells although mammalian cell 

20 suspensions may also be used. Illustrative examples of mammalian cell lines include VERO and HeLa cells, Chinese 
hamster ovary (CHO) cell lines, W138, BHK, Cos-7 or MDCK cell lines. 

As indicated above, the vector, e. g„ a plasmid, which is used to transform the host cell, 
preferably contains DNA sequences to initiate transcription and sequences to control the translation of the antigen 
gene sequence. These sequences are referred to as expression control sequences. When the host cell is of insect 

25 or mammalian origin illustrative expression control sequences are obtained from the SV-40 promoter (Science, 
222:524-527, 1983), the CMV I.E. Promoter (Proc. NatL Acad. ScL 81:659-663, 1984) or the metallothionein 
promoter (Nature 296:3942, 1982). The cloning vector containing the expression control sequences is cleaved using 
restriction enzymes and adjusted in size as necessary or desirable and ligated with DNA coding for the SABP or 
DABP polypeptides by means well known in the art. 

30 As with yeast, when higher animal host cells are employed, polyadenlyation or transcription 

terminator sequences from known mammalian genes need to be incorporated into the vector. An example of a 
terminator sequence is the polyadenlyation sequence from the bovine growth hormone gene. Sequences for accurate 
splicing of the transcript may also be included. An example of a splicing sequence is the VPI intron from SV40 
(Sprague, J. et aL, 1983, J. ViroL 45: 773-781). 

35 Additionally, gene sequences to control replication in the host cell may be incorporated into the 

vector such as those found in bovine papilloma virus type-vectors. Saveria-Campo, M., 1985, "Bovine Papilloma virus 
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DNA a Eukaryotic Cloning Vector" in DNA Cloning Vol. II a Practical Approach Ed. D.M. Glover, IRL Press, Arlington, 
Virginia pp. 213-238. 

The host cells are competent or rendered competent for transformation by various mean*. -There 
are several well-known methods of introducing DNA into animal cells. These include: calcium phosphate precipitation, 
5 fusion of the recipient cells with bacterial protoplasts containing the DNA, treatment of the recipient cells with 
liposomes containing the DNA, DEAE dextran, electroporation and micro-injection of the DNA directly into the cells. 

The transformed cells are cultured by means well known in the art. Biochemical Methods in Cell 
Culture and Virology. Kuchler, R.J., Dowden, Hutchinson and Ross, Inc., (1977). The expressed DABP and SABP 
binding domain polypeptides are isolated from cells grown as suspensions or as monolayers. The latter are recovered 
10 by well known mechanical, chemical or enzymatic means. 

c. Expression in recombinant vaccinia virus- or adenovirus-infected cells 
In addition to use in recombinant expression systems, the isolated binding domain DNA sequences 
can also be used to transform viruses that transfect host cells in the patient. Live attenuated viruses, such as 
vaccinia or adenovirus, are convenient alternatives for vaccines because they are inexpensive to produce and are 
15 easily transported and administered. Vaccinia vectors and methods useful in immunization protocols are described, 
for example, in U.S. Patent No. 4,722,848. 

Suitable viruses for use in the present invention include, but are not limited to, pox viruses, such 
as canarypox and cowpox viruses, and vaccinia viruses, alpha viruses, adenoviruses, and other animal viruses. The 
recombinant viruses can be produced by methods well known in the art, for example, using homologous recombination 
20 or ligating two plasmids. A recombinant canarypox or cowpox virus can be made, for example, by inserting the 
DNA's encoding the DABP and SABP binding domain polypeptides into plasmids so that they are flanked by viral 
sequences on both sides. The DNA's encoding the binding domains are then inserted into the virus genome through 
homologous recombination. 

A recombinant adenovirus can be produced, for example, by ligating together two plasmids each 
25 containing about 50% of the viral sequence and the DNA sequence encoding erythrocyte binding domain polypeptide. 
Recombinant RNA viruses such as the alpha virus can be made via a cDNA intermediate using methods known in 
the art. 

In the case of vaccinia virus (for example, strain WR), the DNA sequence encoding the binding 
domains can be inserted in the genome by a number of methods including homologous recombination using a transfer 
30 vector, pTKgpt-OFIS as described in Kaslow, et al. Science 252:13101313 (1991). 

Alternately the DNA encoding the SABP and DABP binding domains may be inserted into another 
plasmid designed for producing recombinant vaccinia, such as pGS62, Langford, C.L, et al., 1986, MoL Cell. Biol. 
8:3191-3199. This plasmid consists of a cloning site for insertion of foreign genes, the P7.5 promoter of vaccinia 
to direct synthesis of the inserted gene, and the vaccinia TK gene flanking both ends of the foreign gene. 
35 Confirmation of production of recombinant virus can be achieved by DNA hybridization using cDNA 

encoding the DABP and SABP binding domain polypeptides and by immunodetection techniques using antibodies 
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specific for the expressed binding domain polypeptides. Virus stocks may be prepared by infection of ceils such as 
HELA S3 spinner cells and harvesting of virus progeny. 

The recombinant virus of the present invention can be used to induce anti-SABP and a-nti DABP 
binding domain antibodies in mammals, such as mice or humans. In addition, the recombinant virus can be used to 
produce the SABP and DABP binding domains by infecting host cells in vitro, which in turn express the polypeptide 
(see section on expression of SABP and DABP binding domains in eukaryotic cells, above). 

The present invention also relates to host cells infected with the recombinant virus. The host cells 
of the present invention are preferably mammalian, such as BSCI cells. Host cells infected with the recombinant 
virus express the DABP and SABP binding domains on their cell surfaces. In addition, membrane extracts of the 
infected cells induce protective antibodies when used to inoculate or boost previously inoculated mammals. 

°- Purification of the SABP. HARP .n d PBL Bindinn nnmain PolvDPntiriP. 
The binding domain polypeptides produced by recombinant DMA technology may be purified by 
standard techniques well known to those of skill in the art. Recombinantly produced binding domain polypeptides 
can be directly expressed or expressed as a fusion protein. The protein is then purified by a combination of cell lysis 
(ft g.. sonication) and affinity chromatography. For fusion products, subsequent digestion of the fusion protein with 
an appropriate proteolytic enzyme release the desired SABP and DABP binding domains. 

The polypeptides of this invention may be purified to substantial purity by standard techniques 
well known in the art. including selective precipitation with such substances as ammonium sulfate, column 
chromatography, immunopurification methods, and others. See, for instance, R. Scopes. Protein Purification: 
20 Principles and Practice, Springer-Verlag. New York, NY (1982). 

^- Production of Bindino Dnmai ns bv protein chemistry techniq iiP« 
The polypeptides of the invention can be synthetically prepared in a wide variety of ways. For 
instance polypeptides of relatively short size, can be synthesized in solution or on a solid support in accordance with 
conventional techniques. Various automatic synthesizers are commercially available and can be used in accordance 
with known protocols. See. for example. Stewart and Young, Solid Phase Peptide Synthesis, 2d. ed.. Pierce Chemical 
Co. (1984). 

Alternatively, purified and isolated SABP, DABP or DBL family proteins may be treated with 
proteolytic enzymes in order to produce the binding domain polypeptides. For example, recombinant DABP and SABP 
proteins may be used for this purpose. The DABP and SABP protein sequence may then be analyzed to select 
proteolytic enzymes to be used to generate polypeptides containing desired regions of the DABP and SABP binding 
domain. The desired polypeptides are then purified by using standard techniques for protein and peptide purification. 
For a review of standard techniques see. Methods in Emymology, "Guide to Protein Purification", M. Deutscher, ed. 
Vol. 182 (1990). pages 619-826. 

f- Modification of nucleic acid and Dolvneatide senuencps 

The nucleotide sequences used to transfect the host ceils used for production of recombinant 
binding domain polypeptides can be modified according to standard techniques to yield binding domain polypeptides. 
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with a variety of desired properties. The binding domain polypeptides of the present invention can be readily 
designed and manufactured utilizing various recombinant DNA techniques well known to those skilled in the art. For 
example, the binding domain polypeptides can vary from the naturally-occurring sequence at the primary structure 
level by amino acid insertions, substitutions, deletions, and the like. These modifications can be used in a number 
5 of combinations to produce the final modified protein chain. 

The amino acid sequence variants can be prepared with various objectives in mind, including 
facilitating purification and preparation of the recombinant polypeptides. The modified polypeptides are also useful 
for modifying plasma half-life, improving therapeutic efficacy, and lessening the severity or occurrence of side effects 
during therapeutic use. The amino acid sequence variants are usually predetermined variants not found in nature but 

10 exhibit the same immunogenic activity as naturally occurring polypeptides. For instance, polypeptide fragments 
comprising only a portion (usually at least about 60-80%, typically 90-95%) of the primary structure may be 
produced. For use as vaccines, polypeptide fragments are typically preferred so long as at least one epitope capable 
of eliciting production of blocking antibodies remains. 

In general, modifications of the sequences encoding the binding domain polypeptides may be readily 

15 accomplished by a variety of well-known techniques, such as site-directed mutagenesis (see, Giliman and Smith, Gene 
8:81-97 (1979) and Roberts, S. et aL, Nature 328:731-734 (1987)). One of ordinary skill will appreciate that the 
effect of many mutations is difficult to predict. Thus, most modifications are evaluated by routine screening in a 
suitable assay for the desired characteristic. For instance, changes in the immunological character of the polypeptide 
can be detected by an appropriate competitive binding assay. Modifications of other properties such as redox or 

20 thermal stability, hydrophobicity, susceptibility to proteolysis, or the tendency to aggregate are all assayed according 
to standard techniques. ~ _ . 

G. Diagnostic and Screening Assays 

The polypeptides and nucelic acids of the invention can be used in diagnostic applications for the 
detection of merozoites or nucleic acids in a biological sample. The presence of parasites can be detected using 

25 several well recognized specific binding assays based on immunological results. (See U.S. Patents 4,366,241; 
4,376,1 10; 4,517,288; and 4,837,168). For instance, labeled monoclonal antibodies to polypeptides of the invention 
can be used to detect merozoites in a biological sample. Alternatively, labelled polypeptides of the invention can be 
used to detect the presence of antibodies to SABP or DABP in a biological sample. For a review of the general 
procedures in diagnostic immunoassays, see also Basic and Clinical Immunology 7th Edition (D. Stites and A. Terr 

30 ed.) 1991. 

In auuitiun, muuifiad pclypGptidcG, cntibcdicc c: ether ccrr^pcunds capable of inhibiting the 
interaction between SABP or DABP and erythrocytes can be assayed for biological activity. For instance, 
polypeptides can be recombinantly expressed on the surface of cells and the ability of the cells to bind erythrocytes 
can be measured as described below. Alternatively, peptides or antibodies can tested for the ability to inhibit binding 
35 between erythrocytes and merozoites or SABP and DABP. 
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Cell-free assays can also be used to measure binding of DABP or SABP polypeptides to isolated Duffy 
antigen or glycophorin polypeptides. For instance, the erythrocyte proteins can be immobilized on a solid surface and 
binding of labelled SABP or DABP polypeptides can be measured. 

Many assay formats employ labelled assay components. The labelling systems can be in a variety of forms. 
5 The label may be coupled directly or indirectly to the desired component of the assay according to methods well 
known in the art. A wide variety of labels may be used. The component may be labelled by any one of several 
methods. The most common method of detection is the use of autoradiography with ^H, ^^^1, ^^S, ^^C, or ^^P 
labelled compounds or the like. Non-radioactive labels include ligands which bind to labelled antibodies, fluorophores, 
chemiluminescent agents, enzymes, and antibodies which can serve as specific binding pair members for a labelled 
10 ligand. The choice of label depends on sensitivity required, ease of conjugation with the compound, stability 
requirements, and available instrumentation. 

In addition, the polypeptides of the invention can be assayed using animal models, well known to those 
of skill in the art. For P falciparum the in mo models include Aotus sp. monkeys or chimpanzees; for P. max the 
in vivo models include Saimiri monkeys. 
15 In the case of the use nucleic acids for diagnostic purposes, standard nucleic hybridization 

techniques can be used to detect the presence of the genes identified here (e.g., members of the DBL family). If 
desired, nucleic acids in the sample may first be amplified using standard procedures such as PCR. Diagnostic kits 
comprising the appropriate primers and probes can also be prepared. 
H. DBL Targeted Thereoeutics 
20 DBL polypeptides are expressed on the surface of Plasmodium-M^zX^^ erythrocytes. As such, they 

present ideal targets for therepeutics which target infected erythrocytes. In one preferred embodiement of the 
present invention, cytotoxic antibodies or antibody fusion proteins with cytotoxic agents are targeted against DBL 
proteins, killing infected erythrocytes and inhibiting the reproduciton of Plasmodium in an infected host. 

The procedure for attaching a cytotoxic agent to an antibody wilt vary according to the chemical 
25 structure of the agent. Antibodies and cytotoxic agents are typically bound together chemically or, where the 
antibody and cytotoxic agents are both polypeptides, are optionally synthesized recombinantly as a fusion protein. 
Polypeptides typically contain variety of functional groups; e.g., carboxylic acid (COOH) or free amine (•NH2) groups, 
which are available for reaction with a suitable functional group on either the antibody or the cytotoxic agent. 

Alternatively, antibodies or cytotoxic agents are derivitized to attach additional reactive functional 
30 groups. The derivatization optionally involves attachment of linker molecules such as those available from Pierce 
Chemical Company, Rockford Illinois. A "linker", as used herein, is a molecule that is used to join the nucleic acid 
binding molecule to the receptor ligand. The linker is capable of forming covalent bonds to both the antibody and 
the cytotoxic agent. Suitable linkers are well known to those of skill in the art and include, but are not limited to, 
straight or branched-chain carbon linkers, heterocyclic carbon linkers, or peptide linkers. Where the antibody and the 
35 cytotoxic agent are polypeptides, the linkers are joined to the constituent amino acids through their side groups (e.g., 
through a disulfide linkage to cysteine) or to the alpha carbon amino and carboxyl groups of the terminal amino acids. 
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A bifunctional linker having one functional group reactive with a group on a particular ligand, and 
another group reactive with a nucleic acid binding molecule, can be used to form the desired conjugate. Alternatively, 
derivatization can proceed through chemical treatment of the ligand or nucleic acid binding molecule, e.g„ glycol 
cleavage of the sugar moiety of a glycoprotein with periodate to generate free aldehyde groups. The free aldehyde 
5 groups on the glycoprotein may be reacted with free amine or hydrazine groups on an agent to bind the agent thereto 
{See, e.g., U.S. Patent No. 4,671,958). Procedures for generation of free sulfhydryl groups on polypeptides, are 
known (See, e.g., U.S. Pat. No. 4,659,839). 

Many procedures and linker molecules for attachment of various compounds to proteins are known. 
See, for example, European Patent Application No. 188,256; U.S. Patent Nos. 4,671,958, 4,659,839, 4.414J48, 
10 4,699,784; 4,680,338; 4,569,789; and 4,589,071; and Borlinghaus et a/. Cancer fies. 47: 40714075 (1987). in 
particular, production of various antibody conjugates is well-known within the art and can be found, for example in 
Thorpe a/.. Monoclonal Antibodies in Clinical Medicine, Academic Press, pp. 168-190 (1982), Waldmann, Science, 
252: 1657 (1991), and U.S. Patent Nos. 4,545,985 and 4,894,443. 

A number of antibodies which bind cell surface receptors have been converted to form suitable 
15 for incorporation into fusion proteins, and similar strategies are used to create fusion-protein antibodies which bind 
DBR polypeptides, see Batra et aL, Mol. Cell. Biol., 11: 2200-2205 (1991); Batra et al, Proc. Natl. Acad. Sci. USA, 
89: 5867-5871 (1992); Brinkmann, et al. Proc. Natl. Acad Sci. USA, 88: 8616-8620 (1991); Brinkmann et al., Proc. 
Natl. Acad. Sci. USA. 90: 547-551 (1993); Chaudhary et ai, Proc. Natl Acad Sci USA, 87: 1066-1070 (1990); 
Friedman et ai, Cancer Res. 53: 334-339 (1993); Kreitman et al., J. Immunol., 149: 2810-2815 (1992); Nicholls 
20 et al., J. Biol Chem., 268: 5302-5308 (1993); and Wells, et al.. Cancer Res., 52: 6310-6317 (1992), respectively). 
B. Production of Fusion Proteins 

Where the antibody fragment and/or the cytotoxic agents are relatively short polypeptides [i.e., 
less than about 50 amino acids) they are often synthesized using standard chemical peptide synthesis techniques. 
Where both molecules are relatively short, a chimeric molecule is optionally synthesized as a single contiguous 
25 polypeptide. Alternatively, the ligand and the nucleic acid binding molecule can be synthesized separately and then 
fused chemically. 

Solid phase synthesis in which the C-terminal amino acid of the sequence is attached to an 
insoluble support followed by sequential addition of the remaining amino acids in the sequence is a preferred method 
for the chemical synthesis of the ligands of this invention. Techniques for solid phase synthesis are described by 

30 Barany and Merrifield, Solid-Phase Peptide Synthesis; pp. 3-284 in The Peptides: Analysis, Synthesis, Biology. Vol. 
2: Special 'Methods in Peptide Synihesis, rati A., wBunieiu, et a!., J, An,. Chen}. Soc, 35: 215S (1GS2), crd 
Stewart et ai. Solid Phase Peptide Synthesis, 2nd ed Pierce Chem. Co., Rockford, 111. (1984). 

In a preferred embodiment, the fusion molecules of the invention are synthesized using recombinant 
nucleic acid methodology. Generally this involves creating a nucleic acid sequence that encodes the receptor-targeted 

35 fusion molecule, placing the nucleic acid in an expression cassette under the control of a particular promoter, 
expressing the protein in a host, isolating the expressed protein and, if required, renaturing the protein. Techniques 
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sufficient to guide one of skill through such procedures are found in, e.g., Berger, Sambrook, Ausubel, Innis, and 
Freshney (all supra). 

While the two molecules are often joined directly together, one of skill will appreciate that the 
molecules may be separated by a peptide spacer consisting of one or more amino acids. Generally the spacer will 
5 have no specific biological activity other than to join the proteins or to preserve some minimum distance or other 
spatial relationship between them. However, the constituent amino acids of the spacer may be selected to influence 
some property of the molecule such as the folding, net charge, or hydrophobicity. 

Once expressed, recombinant fusion proteins can be purified according to standard procedures, 
including ammonium sulfate precipitation, affinity columns, column chromatography, gel electrophoresis and the like 
10 (see, generally, R. Scopes, Protein Purification, Springer-Verlag, N.Y. (1982), Deutscher, Methods in Enzymology Vol. 
182: Guide to Protein Purification., Academic Press, Inc. N.Y. (1990)). Substantially pure compositions of about 50 
to 95% homogeneity are preferred, and 80 to 95% or greater homogeneity are most preferred for use as therepeutic 
agents. 

One of skill in the art will recognize that after chemical synthesis, biological expression, or 
15 purification, the fusion molecule may possess a conformation substantially different than the native conformations 
of the constituent polypeptides. In this case, it is often necessary to denature and reduce the polypeptide and then 
to cause the polypeptide to re-fold into the preferred conformation. Methods of reducing and denaturing proteins 
and inducing re-folding are well known to those of skill in the art (See, Debinski et at. J. Biol. Chem., 268: 14065- 
14070 (1993); Kreitman and Pastan, Bioconjug. Chem., 4: 581-585 (1993); and Buchner, etai. Anal Biochem., 205: 
20 263-270 (1992). 

I- Pharmaceutical compositions comprising binding domain Dolvpeptides 

The polypeptides of the invention are useful in therapeutic and prophylactic applications for the 
treatment of malaria. Pharmaceutical compositions of the invention are suitable for use in a variety of drug delivery 
systems. Suitable formulations for use in the present invention are found in Remington's Pharmaceutical Sciences, 
25 Mack Publishing Company, Philadelphia, PA, 17th ed. (1985). For a brief review of methods for drug delivery, see, 
Langer, Science 249:1 527-1533 (1990). 

The polypeptides of the present invention can be used in pharmaceutical and vaccine compositions 
that are useful for administration to mammals, particularly humans. The polypeptides can be administered together 
in certain circumstances, e.g. where infection by both P. falciparum and P. max is likely. Thus, a single 
30 pharmaceutical composition can be used for the treatment or prophylaxis of malaria caused by both parasites. 

The compositions are suitable for single administrations or a series of administrations. When given 
as a series, inoculations subsequent to the initial administration are given to boost the immune response and are 
typically referred to as booster inoculations. 

The pharmaceutical compositions of the invention are intended for parenteral, topical, oral or local 
35 administration. Preferably, the pharmaceutical compositions are administered parenterally, e.g., intravenously, 
subcutaneously, intradermally, or intramuscularly. Thus, the invention provides compositions for parenteral 
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administration that comprise a solution of the agents described above dissolved or suspended in an acceptable carrier, 
preferably an aqueous carrier. A variety of aqueous carriers may be used, e.g., water, buffered water, 0,4% saline, 
0.3% glycine, hyaluronic acid and the like. These compositions may be sterilized by conventional, well* known 
sterilization techniques, or may be sterile filtered. The resulting aqueous solutions may be packaged for use as is, 
or lyophilized, the lyophilized preparation being combined with a sterile solution prior to administration. The 
compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological 
conditions, such as pH adjusting and buffering agents, tonicity adjusting agents, wetting agents and the like, for 
example, sodium acetate, sodium lactate, sodium chloride, potassium chloride, calcium chloride, sorbitan monolaurate, 
triethanolamine oleate, etc. 

For solid compositions, conventional nontoxic solid carriers may be used which include, for example, 
pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharin, talcum, cellulose, glucose, 
sucrose, magnesium carbonate, and the like. For oral administration, a pharmaceutically acceptable nontoxic 
composition is formed by incorporating any of the normally employed excipients, such as those carriers previously 
listed, and generally 10-95% of active ingredient and more preferably at a concentration of 25%-75%. 

For aerosol administration, the polypeptides are preferably supplied in finely divided form along with 
a surfactant and propellent. The surfactant must, of course, be nontoxic, and preferably soluble in the propellent. 
Representative of such agents are the esters or partial esters of fatty acids containing from 6 to 22 carbon atoms, 
such as caproic, octanoic, lauric, palmitic, stearic, linoleic, linolenic, olesteric and oleic acids with an aliphatic 
polyhydric alcohol or its cyclic anhydride. Mixed esters, such as mixed or natural glycerides may be employed. A 
carrier can also be included, as desired, as with, e.g., lecithin for intranasal delivery. 

In certain embodiments patients with malaria may be treated with SABP or DABP polypeptides 
or other specific blocking agents (e.g monoclonal antibodies) that prevent binding of Plasmodium merozoites and 
schizonts to the erythrocyte surface. 

The amount administered to the patient will vary depending upon what is being administered, the 
state of the patient and the manner of administration. In therapeutic applications, compositions are administered 
to a patient already suffering from malaria in an amount sufficient to inhibit spread of the parasite through 
erythrocytes and thus cure or at least partially arrest the symptoms of the disease and its complications. An amount 
adequate to accomplish this is defined as "therapeutically effective dose." Amounts effective for this use will depend 
on the severity of the disease, the particular composition, and the weight and general state of the patient. Generally, 
the dose will be in the range of about Img to about 5gm per day, preferably about 100 mg per day, for a 70 kg 

putlSnt. 

Alternatively, the polypeptides of the invention can be used prophylactically as vaccines. The 
vaccines of the invention contain as an active ingredient an immunogenically effective amount of the binding domain 
polypeptide or of a recombinant virus as described herein. The immune response may include the generation of 
antibodies; activation of cytotoxic T lymphocytes (CTL) against cells presenting peptides derived from the peptides 
encoded by the SABP, DABP or DBL sequences of the present invention, or other mechanisms well known in the art. 
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See e,g. Paul Fundamentailmmunology, Second Edition {Raven Press, New York, NY) for a description of immune 
response. Useful carriers are well known in the art, and include, for example, thyroglobulin, albumins such as human 
serum albumin, tetanus toxoid, polyamino acids such as poIy(D-lysine:D-glutamic acid), influenza, hepatitis B virus core 
protein, hepatitis B virus recombinant vaccine. The vaccines can also contain a physiologically tolerable (acceptable) 
diluent such as water, phosphate buffered saline, or saline, and further typically include an adjuvant. Adjuvants such 
as incomplete Freund's adjuvant, aluminum phosphate, aluminum hydroxide, or alum are materials well known in the 
art. 

The DNA or RNA encoding the SABP or DABP binding domains and the DBL gene family motifs 
may be introduced into patients to obtain an immune response to the polypeptides which the nucleic acid encodes. 
Wolff et. al.. Science 247: 1465-1468 (1990) which is describes the use of nucleic acids to produce expression of 
the genes which the nucleic acids encode. 

Vaccine compositions containing the polypeptides, nucleic acids or viruses of the invention are 
administered to a patient to elicit a protective immune response against the polypeptide. A "protective immune 
response" is one which prevents or inhibits the spread of the parasite through erythrocytes and thus at least partially 
prevent the symptoms of the disease and its complications. An amount sufficient to accomplish this is defined as 
an "immunogenically effective dose." Amounts effective for this use will depend on the composition, the manner 
of administration, the weight and general state of health of the patient, and the judgment of the prescribing 
physician. For peptide compositions, the general range for the initial immunization (that is for therapeutic or 
prophylactic administration) is from about 100 /yg to about 1 gm of peptide for a 70 kg patient, followed by 
boosting dosages of from about 100 a'O to about 1 gm of the polypeptide pursuant to a boosting regimen over 
weeks to months depending upon the patient's response and condition e.g. by measuring levels of parasite in the 
patient's blood. For nucleic acids, typically 30-lOOOug of nucleic acid is injected into a 70kg patient, more typically 
about 50 150ug of nucleic acid is injected into a 70kg patient followed by boosting doses as appropriate. 

The following examples illustrate preferred embodiments of the invention. 
EXAMPLE 1: Identification of the amino-terminaL cvsteinerich region of SABP and DABP as binding 
domains for erythrocytes 

1. Expression of the SABP binding domain polvoeptide on the surface of Cos cells . 
To demonstrate that the amino-terminal, cysteine-rich region of the SABP protein is the sialic acid binding 
region, this region of the protein was expressed on the surface of mammalian Cos cells in vitro. This DNA sequence 
is from position 1 to position 1848 of the SABP DNA sequence (SEQ ID No 3). Polymerase chain reaction 
technology (PCR) was used to amplify this region of the SABP DNA directly from the cloned gene. 

Sequences corresponding to restriction endonuciease sites for Pvuli or Apal were incorporated into 
the oligonucleotide sequence of the probes used in PCR amplification in order to facilitate insertion of the 
PCR-amplified regions into the pRE4 vector (see below). The specific oligonucleotides, 
5'-ATCGATCAGCTGGGAAGAAATACTTCATCT.3'(SEQID N0:17) and 5'.ATCGATGGGCCCCGAAGTTTGnCAnAn-3' 
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(SEQ ID N0:18) were synthesized. These oligonucleotides were used as primers to PCR-amplify the region of the 
DNA sequence encoding the cysteine-rich amino terminal region of the SABP protein. 

PCR conditions were based on the standard described in Saiki, et aL, Science 239: 487-491 11988). 
Template DNA was provided from cloned fragments of the gene encoding SABP which had been spliced and re-cloned 
5 as a single open-reading frame piece. 

The vector, pRE4, used for expression in Cos cells is shown in Figure 2. The vector has an SV40 origin 
of replication, an ampicillin resistance marker and the Herpes simplex virus glycoprotein D gene (HSV glyd) cloned 
downstream of the Rous sarcoma virus long terminal repeats (RSV LTR). Part of the extracellular domain of the HSV 
glyd gene was excised using the Pvull and Apal sites in HSV glyd. 
10 As described above, the PCR oligonucleotide primers contained the Pvull or Apal restriction sites. 

The PCR amplified DNA fragments obtained above were digested with the restriction enzymes Pvull and Apal and 
cloned into the Pvull and Apal sites of the vector pRE4. These constructs were designed to express regions of the 
SABP protein as chimeric proteins with the signal sequence of HSV glyd at the N-terminal end and the 
transmembrane and cytoplasmic domain of HSV glyd at the C-terminal end. The signal sequence of HSV glyd targets 
15 these chimeric proteins to the surface of Cos cells and the transmembrane segment of HSV glyd anchors these 
chimeric proteins to the Cos cell surface. 

Mammalian Cos cells were transfected with the pRE4 constructs containing the PCR amplified 
SABP DNA regions, by calcium phosphate precipitation according to standard techniques. 

2. Expression of the DABP binding domain ooivpeptide on the surface of Cos cells . 

20 To demonstrate that the amino terminal, cysteine-rich region of the DABP protein is the binding 

domain, this region was expressed ori the surface of Cos cells. This region of the DNA sequence from position 1-975 
was first PCR-amplified (SEQ ID No 1). 

Sequences corresponding to restriction endonuclease sites for Pvull or Apal were incorporated into 
the oligonucleotide probes used for PCR amplification in order to facilitate subsequent insertion of the amplified DNA 

25 into the pRE4 vector, as described above. The oligonucleotides, 5'-TCTCGTCAGCTGACGATCTCTAGTGCTATT-3' (SEQ 
ID N0:19) and 5'-ACGAGTGGGCCCTGTCACAACnCCTGAGT-3' (SEQ ID NG:20) were synthesized. These 
oligonucleotides were used as primers to amplify the region of the DABP DNA sequence encoding the cysteine-rich, 
amino-terminal region of the DABP protein directly from the cloned DABP gene, using the same conditions described 
above. 

30 The same pRE4 vector described above in the section on expression of SABP regions in Cos cells 

was also used as a vector for the DABP DNA regions. 

3. Binding studies with erythrocytes . 

To demonstrate their ability to bind human erythrocytes, the transfected Cos cells expressing 
binding domains from DABP and SABP were incubated with erythrocytes for two hours at 37°C in culture media 
35 {DMEM/10% FBS). The non-adherent erythrocytes were removed with five washes of phosphate-buff ered saline and 
the bound erythrocytes were observed by light microscopy. Cos cells expressing the amino terminal, cysteine-rich 
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SABP polypeptides on their surface bound untreated human erythrocytes, but did not bind neuraminidase treated 
erythrocytes, that is. erythrocytes which lacic sialic acid residues on their surface. Cos cells expressing other regions 
of the SABP protein on their surface did not bind human erythrocytes. These results identified the amino-terminal 
cysteine-rich region of SABP as the erythrocyte binding domain and-indicated that the binding of Cos cells expressing 
these regions to human erythrocytes is specific. Furthermore, the binding of the expressed region to erythrocytes 
is identical to the binding pattern seen for the authentic SABP- 175 molecule upon binding to erythrocytes. 

Similarly. Cos cells expressing the amino-terminal cysteine-rich region of DABP on their surface 
bound Duffy-positive human erythrocytes, but did not bind Duffy-negative human erythrocytes, that is erythrocytes 
which lack the Duffy blood group antigen. Cos cells expressing other regions of the DABP protein on their surface 
did not bind human erythrocytes. These results identified the amino-terminal cysteine rich region of DABP as the 
erythrocyte binding domain and indicated that the binding of the Cos cells was specific. 
EXAMPLE 2: Isolation of oolvnuclentide sequences in the DBL family 

/'./a/f//7aAi//7i clones and cell line used include the following. P. falciparum D10 LF4/1 

Camp/AI. SL/06. HB3. 7G8. V1/S, T2/C6, KMWII, ltG2F6. FCR3/A2 and Dd2 have been previously tabulated (Dolan, 
Bt al. <1993), Mol. Biochem. Parasitol. 61. 137-142). Line Dd2/NM1 was selected from clone Dd2 for invasion via 
a sialic acid-independent pathway (Dolan, a/. (1990), J. Clin. Inyest. 86, 618 624). All parasites were maintained 
in vitro by standard methods (Trager, et al. (1976), Science 193. 673-675). 

DNA and RNA Isolation and Analysis. DNA was extracted as described (Peterson, et al. (1 990) 
Proc. Natl Acad. Sci. USA 87, 3018 3022). Endonuclease digestion, agarose gel electrophoresis, and filtei 
hybridizations were performed by standard methods (Sambrook, et al., 1989). All hybridizations were at 56»C 
(Sambrook, et al.. 1989). Blots were washed for 2 min. at room temperature in 2x standard saline/phosphate/EDTA 
(SSPE) with 0.5% SDS, followed by two higher stringency washes at 50°C in 0.3xSSPE with 0.5% SDS. Parasite 
chromosomes were embedded in agarose blocks and separated by pulsed field gel electrophoresis (Dolan. et al 
(1993), Metl,ads. Mol Biol. 21. 319-332). RNA was isolated from cultured parasites by LiCI extraction of 
Catrimox.14.precipit3ted RNA (Dahle, et al (1993). BioTechnigues 15, 1102-1105). Agarose gel electrophoresis of 
total RNA and filter hybridizations were performed by standard methods (Sambrook, et al, (1989). 

Oligonucleotide Primers and PCR. Primers specific for E31a used in a RT PCR to test for 
expression of this sequence were E31aT2 (S'-AGA-CCT-CAA-Tn-CTA-AG-S') (SEQ ID N0:21) and E31aRev1 
(5'-AAT-CGC'GAG-CAT-CAT-CTG-3') (SEQ ID ND:22). 

Two primers were used to amplify additional sequences from genes encoding DBL domains. These 
were designed from conserved amino acids encoded in the DBL domain of the eba-175 and E31a sequences. After 
adaptation to incorporate the most frequently-used P falciparum codons, forward primer UNIEBP5' 
[5'.CC(A/G).AG(G/A).AG(G/A)-CAA.(G/A)AA.(C/T)TA.TG.3'1 (SEQ ID N0:23), based upon the amino acid sequence 
PRRQKLC. and reverse primer UNIEBP3' [5'-CCA (A/T)C(T/G)-(T/G)A(A/G)-(A/G)AA-TTG.(A/T)GG-3'l (SEQ ID NQ:24), 
based upon the amino acid sequence PQFLRW, were synthesized. 
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RT-PCR amplifications were performed as described (Kawasaki, et al, 11990), PCR Protocols, A 
Guide to Methods and Applications, eds. Innis, M.A., Gelfand, D.H., Sninsku, J.J. & White, T.J. (Academic, San 
Diego), pp. 21-27). In brief, 0.5 to 1 mg of total RNA was treated with RQ1 DNAse (Promega), phenol/chloroform 
extracted, and ethanot precipitated. The RNA was then annealed with random oligonucleotide primers and extended 
5 with Superscript reverse transcriptase (GIBCO/BRL). PCR cycling conditions were 94**C for 10 sec, 45**C for 15 
sec, and 72°C for 45 sec, for 30 cycles. All PCRs were performed in an Idaho Technology air thermal cycler using 
buffer containing 2 mM Mg2-t-. 

PCR amplification products were separated by use of PCR Purity Plus gels and protocols (AT 
Biochem, Malvern, PA). 

^0 DNA Clones and Hybridization Probes. Clone pE31a was isolated from a genomic library 

prepared from the region of chromosome 7 linked to chloroquine resistance Walker-Jonah, et al (1992), Mol. 
Biochem, Parasitol. 51, 313-320. Clone pS31H (GenBank accession no. L38454), containing an insert encompassing 
that of pE31a, was cloned from a size-selected Hind III restriction digest of Dd2 genomic DNA. 

Clone pEBLel was cloned from a RT-PCR of Dd2 cDNA after amplification with primers UNIEBP5' 

15 (SEQ ID I\I0:23) and UNIEBP3' (SEQ ID N0:24). Clone pEBP1.2 (GenBank accession no. L38450), containing an insert 
encompassing that of pEBLel, was isolated from a Dd2 cDNA library probed with pEBLel. /75Z-encoding sequences 
of dbl-nm1-4 (GenBank accession no. L38455) and dblnml-S (GenBank accession no. L38453) were amplified by 
RT-PCR from first strand cDNA of line Dd2/NM using primers UNIEBP5' and UNIEBP3'. Sequencing was performed 
on double stranded DNA templates by standard protocols for the dideoxynucleotide method. (Sequenase; U.S. 

20 Biochemicals). 

Sequences related to the E31a sequence were detected with the 3005 bp insert of clone pS31H. 

The eba175 gene was detected with a PCR amplified probe consisting of the first 1825 bp of the coding sequence. 

ebhl sequences were detected with the 2098 bp insert of clone pEBP1.2. All probes were comparable in 

organization, each containing a region encoding at least one DBL domain and varying amounts of flanking sequence. 
25 Homology searches and alignments. Homology searches were performed with BLAST and the 

Genetics Computer Group program FASTA (Altschul, et al. (1990), J, Mol. Biol. 215, 403-410; Devereux, et al 

(1984), Nucleic Acids. Res. 12(1 Pt 1, 387-395). Optimized alignments were produced with MACAW sequence 

alignment software (Schuler, et al (1991), Proteins. 9, 180-190). 

Multiple P. falciparum sequences encode DBL domains. Positional cloning experiments directed 
30 to P. falciparum chromosome 7 identified an ORF (E31a) encoding a DBL domain that is homologous to the domains 

found in the P. vivax and H. knowlesi DABPs and the F. faiciparum SABF, Fiyure 4 siiuws ilit; iBbiiivc puiiuori Oi 

the E31a ORF on chromosome 7. 

The homology between the DBL domains of E31a and the erythrocyte-binding proteins is due to 

the presence of short motifs of highly conserved amino acids. These well-conserved stretches are separated by 
35 non-homologous sequences and by deletions and insertions that vary the size of the domain by greater than 60 aa. 

The typical DBL domain contains 12 or more cysteine residues and has 7 conserved tryptophan residues. Additional 
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well conserved amino acids include 4 arginines, 3 aspartates, 9 positions with aliphatic residues (alanine, isoleucine, 
leucine, or valine) and 4 with aromatic amino acids (tryptophan, phenylalanine, or tyrosine). 

Probes spanning the sequence that encodes the E31a DBL domain hybridized to multiple fragments 
within a single restriction digest and yielded bands that varied among parasite lines. The numerous distinct bands 
from a selection of different parasite DNAs indicated a large number of diverse but related elements. These multiple 
bands varied among different P. falciparum clones, in contrast to the well-conserved, single-copy signal obtained with 
the eba-175 probe. 

Because of the numerous cross-hybridizing sequences, it seemed likely that many of these related 
sequences would be on different chromosomes of the parasite. PFG electrophoresis of P. falciparum Dd2 
chromosomes and hybridization with the E31a probe identified a number of cross-hybridizing sequences on multiple 
chromosomes. A control hybridization with the ^A^- 775 probe under identical conditions yielded a single band of 
hybridization from chromosome 7. 

RNA Analysis of DBL Elements. Sequences from E31a (pS31H insert) were used to probe RNA 
blots for corresponding transcripts. No hybridization was detected. Because it was still possible that a message 
of low abundance was not being detected on the RNA blot, RT-PCR was used as a means of more sensitive 
detection. For this purpose, cDNA was generated by RT from random primers annealed to DNAse-treated total RNA. 
E31a-specific oligonucleotides were then used to test for amplification from the cDNA. No amplification of the E31a 
sequence was obtained, while genomic DNA controls and amplification from cDNA by dihydrofolate 
reductase/thymidylatesynthetase-specific primers yielded the expected bands. A screen of a cDNA library with E31a 
specific probes also failed to detect any clones hybridizing with the ORF. These results indicate that E31a is either 
a pseudogene, or is expressed in parasite strains or stages not examined in this work. 

A PCR Method to Isolate Sequences Encoding DBL Domains. The identification of short 
conserved motifs in DBL domains that otherwise have extreme diversity ted to a PCR strategy using degenerate 
oligonucleotide primers designed from conserved amino acid sequences in the DBL domains. Sequences PRRQKLC 
and PQFLRW were judged most suitable for minimizing degeneracy while allowing amplification of expressed DBL 
sequences. After these considerations and adjustment for P. falciparum codon usage, primers UNIEBP5' and 
UNIEBP3' were synthesized. 

While some falciparum lines yielded similar patterns of amplified bands (e, g. Dd2 and MCamp; 
FCR3/A2 and K-1), no two separate isolates showed identical patterns, reflecting the diversity of the DBL domains 
in the parasite lines. A few bands of the same apparent size were present in many isolates. These included a 
consistent 490 bp product that was determined to be the eba-175 ^tm by its expected size and hybridization to 
a gene-specific probe. The number of discernible bands probably underestimates the number of amplifiable sequences 
because of overlapping products of the same size and possible preferential amplification of some sequences over 
others. Nevertheless, the parasite-specific patterns in the amplified bands may provide a means to quickly type 
isolates and serves as a measure of parasite diversity in field samples. 
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To identify Z?5Z-encoding sequences in RNA transcripts, the UNIEBP primers were used to amplify 
first-strand cDNAs generated from DNAse-treated RNA preparations. Amplified products from Dd2, 3D7, HB3 and 
MCAMP cDNAs had diverse sizes ranging from 400 bp to nearly 1 kb. These included a band at 480-500 bp that 
was determined to be eba US from its expected size and cross-hybridization to an ffAa-/75-specific probe. Other 
5 bands were from amplification of different transcripts encoding DBL domains. Dd2-NM1 RNA, for example, yielded 
bands above the eba US product that included two related sequences [dbl-nmUA^dbl-nml'S). These bands were 
found to be isolate-specific and to have features consistent with the var genes described in Example 3, below. 
Probes that detect dbl-nmh4 and dblnmhS hybridized to multiple chromosomes and aligned more closely with E31a 
than with EBA-175 or DABP. 

10 The RT-PCR amplifications also yielded a consistent band that encoded a novel DBL domain distinct 

from eba'175. A cDNA clone corresponding to this product was isolated by screening a /IgtIO Dd2 cDNA library 
with a radiolabeled ebl-l probe. Sequence from this and additional overlapping cDNA clones confirmed the conserved 
motifs of the DBL domain. The alignment of the predicted amino acid sequences showed that the DBL domain of 
ebl l is more similar to eba175 than to the multicopy genes. There was, however, extensive divergence from 

15 eba-175 and other known genes outside of the amplified region. 

In contrast to the multicopy hybridization patterns of dbl-nm1-4 and dbl-nml-S, the ebl-l sequence, 
like that of eba US, was found to have hybridization patterns consistent with a conserved single-copy gene. Probes 
specific for ebl l hybridized only to chromosome 13, and restriction analysis with the enzymes Cla I, EcdWi, HinAWy 
Hin\ I, Nsi I Bsa I, and Spe I, all yielded bands expected from a single copy sequence. RNA blots probed with 

20 e^/- /-specific sequences showed several bands of hybridization, however, corresponding to 8-9.5 kb transcripts in 
mRNA from the Dd2 and 3D7 parasites. The transcripts of different size may result from alternative start and 
termination points or from incompletely processed species containing introns. 
EXAMPLE 3: Isolation of w genes 

Parasite clo nes, DNA analysis and Chromosome MaooinQ . Parasite clones were cultivated by the methods 

25 of (Trager, et aL (1976), Science 193. 673-675). DNA was extracted from parasite cultures as described (Peterson, 
et al. (1988), Proc. NatL Acad Sci. USA 85, 9114-9118) except that the DNA was as recoverd by ethanol 
precipitation rather than spooling. Fingerprint analysis with the pC4.H32 probe was used to confirm DNA 
preparations (Dolan, et al (1993), Mol. Biocbem. ParasitoL 61, 137-142). Southern blotting to Nytran membranes 
was recommended by the manufacturer (Schleicher & Schuell, Keene, NH). PFG separation of the 14 P. falciparum 

30 chromosomes and chromosome mapping were performed as described (Wellems, et aL (1987), Cell A% 633-642; 
Siniiis, tii w\ (ISoo), ucnQiTiics 3, 2S7 2S5). 

RNA isolation . Parasites from 200 ml mixed stage cultures (5-10% parasitemia) were released by saponin 
lysis as for DNA preparations except that the procedures were performed with ice-cold solutions. RNA was 
immediately isolated from the parasite pellet by guanidine thiocyanate/phenol-chloroform methods, recovered and 

35 treated with RNAase-free DNAse (Creedon, et aL (1994), J. BioL Chem, 269, 16364-16370. RNA in H2O was 
combined with 2 vol 100% ETOH, distributed into 2 ml vials and frozen as stock at -70°C. RNA was recovered by 
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precipitation with 0.1 vol 3M NaOAc. RNA blots were generated and probed as described (Creedon, et aL (1994), 
J. Biol, Chem. 269, 16364-16370). 

YAC isolation, chromos ome-seoment libraries and cDNA libraries . Overlapping YACs spanning the 300 kb 
segment of chromosome 7 that contains the CQR locus were obtained from a YAC library of a CQR FCR3 parasite 
line de Bruin, et aL (1992), Genomics 14, 332-339) by the procedures of Lanzer, et al, (1993), Nature 361, 654-657. 
Orientation of the YACs and their overlaps were identified with probes obtained from the YAC ends by inverted PGR. 

Attempts to construct cosmid libraries and large insert (- 10 kb) yl libraries from high molecular 
weight P. falciparum genomic DNA yielded only rearranged clones. An alternative approach was therefore taken in 
which chromosome-segment libraries were constructed that contained small (0.5-5 kb) inserts in plasmid vectors. 
Plasmid libraries containing Alu\, HinW, Rsa\ and S5p\ inserts in pCDNAII were constructed from Dd2 chromosome 
7 restriction fragments purified by pulsed field gel (PFG) electrophoresis (Wellems, et aL (1991), Proc, NatL Acad, 
ScL USA 88, 3382-3386). A plasmid library from a 34 kb Apa\-Sma\ restriction fragment of YAC PfYED9 was 
constructed by the same methods. Inserts in the plasmid libraries were generally 0.54 kb. 

The >tgtlO Dd2 cDNA library was prepared under contract by CloneTech Laboratories Inc. (Palo 
Alto, CA) from the DNAse-treated,polyA-i- fraction of Dd2 RNA. The cDNA was generated in two separate reactions 
using oligodT primers or random primers. Products of these reactions were combined, processed and cloned into the 
EcoRI site of yigtlO. 1.6 x 10^ independent recombinants were obtained and amplified. 

Isolation of overlapDino c lones and DNA seauencino . Plasmid clones from the chromosome-segment and 
YAC-segment libraries were picked at random and their locations were established by restriction mapping. After 
sequence data from these clones were generated, overlapping clones were isolated in a process of "chromosome 
walking" by rescreening the libraries with oligonucleotide probes near the ends of sequenced inserts. Sufficient 
divergence was present among repetitive elements in the sequences to allow distinction of clones and unambiguous 
assignment of overlaps (generally 50-200 bp). 

Sequencing reactions with single-strand Ml 3 DNA (1 //g) and double-strand plasmid DNA (2-5 //g) 
were performed in 96-well polyvinyl chloride U-bottom microassay plates using a Sequenase protocol recommended 
by United States Biochemical Corp. (Cleveland, OH). Reactions were separated by 8M urea-6% polyacrylamide 
sequencing gels and exposed to Kodak BioMax MR film. Sequence data from some clones were also obtained by 
use of an ABI 373A automated DNA sequencer (Applied Biosystems Inc., Foster City, CA). Cycle sequencing 
reactions were performed using the ABI PRISM DyeOeoxy system. 

DNA sequence editing, analyses and display were performed with MacVector software (International 
Biotechnologies Inc., New Haven, CT), BLAST (Altschul, er a/. (1990), J. MoL BioL 215, 403410), Genetics Computer 
Group programs (Devereux, et aL (1984), Nucleic Acids Res. 12, 387-395) and the DNADRAW package (Shapiro, et 
aL (1986), Nucleic Acids Res. 14, 65-73) maintained at the National Institutes of Health. 

Identification of a large hvoervariable region within a chromosome 7 segment linked to chloroouine 
resistance. Four overlapping yeast artificial chromosomes from the P. falciparum FCR3 line were obtained that span 
the 300 kb chromosome segment linked to CQR, a segment located 300-600 kb from the telomere of chromosome 
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7. Figure 5 shows the positions of these YACs |PfYEF2. PfYFEB, PfYKFB, PfYED9| relative to the chromosome map. 
In order to define the structure of this 300 kb segment, we performed comparative hybridizations to search for 
polymorphisms between parasite lines. Clones were randomly picked from chromosome segment-specific plasmid 
libraries and their inserts were hybridized against restriction digests of the YAC and parasite DNAS. Over thirty 
5 inserts were identified that recognized PfYEF2. PfYFEB or PfYKFS and showed a predonderance of single copy 
sequences with few polymorphisms \Aluy Hinf\, Rsa\ and Ssp\ digests), consistent with prior findings that 
chromosome internal regions are largely conserved and contain a preponderance of single copy sequences. However, 
fifteen other inserts that recognized PfYED9 showed highly polymorphic sets of repetitive elements in the parasite 
DNAs. Southern analysis indicated that these polymorphic elements were part of a chromosome hypervariable region 

10 contained within the PfYEDS clone. 

Mapping and DNA seouen cino of the hypervariable region soanned by YAC PfYED9 . Single copy sequences 
detected by pE45b and pH270.5 flank the hypervariable region spanned by PfYEDQ (Figure 5). The pE45b and 
PH270.5 probes were therefore used to assign large restriction fragments on the PfYEDS map and establish enzyme 
recognition sites as reference points. A detailed restriction map of the PfYEDG hypervariable region was then 

15 developed. Fifteen overlapping clones ("a"-"f' and "h"-"o" in Figure 5) were isolated by a chromosome walking 
approach from Dd2 chromosome subsegment libraries {Wellems et aL, supra) The inserts yielded 19.1 kb of 
continuous Dd2 sequence having predicted enzyme recognition sites in perfect accord with the PfYEDS restriction 
map. Such agreement indicates that the Dd2 and FCR3 sequences in this part of the chromosome are very similar, 
despite differences elsewhere in the genome that are evident by restriction analysis. 

20 We also obtained genomic sequence data from the 34 kb Apa\ Sma\ fragment of PfYEDS. Purified 

PfYEDS DNA was cut with Sma\ to yield a 110 kb fragment, which was then isolated by PFG electrophoresis and 
digested with Apal The resulting 34 kb ApalSmal band was purified by PFG electrophoresis, digested in four 
separate reactions by A/ul Hinf\, Rsa\ or Ssp\ and incorporated into a plasmid (PCDNAII) library. Cloned inserts from 
the library were checked for hybridization to the PfYED9 34 kb fragment, assigned to the PfYED9 map and 

25 sequenced (Figure 5). Overlapping inserts were obtained by the chromosome walking approach except for three gaps 
("t", "z", in Figure 5) which were closed by PCR amplification of PfYED9 DNA using primers from flanking 
sequences. The clones from PfYEDS ("r"-"z",>", V and "a"+")ff" in Figure 5) yielded 22.2 kb of continuous DNA 
sequence that overlaps the Dd2 sequence at the "f"/")?" junction and has predicted restriction sites that match the 
PfYED9 map perfectly. The composite sequence from the Dd2 and PfYEDS segments is 40,171 kb. 

30 Structure of a var gene cluster and comparative analysis of predicted amino acid seguences . The 40,171 

bp sequence contains three lu-12 kb regions that have reiaied sequences and sirutiurt;. Eduii ui iiiBsc leyiunS 
harbors a pair of ORES. The first ORF in each pair begins with a consensus ATG start codon preceded by typical 
P. falciparum non-coding sequence of abundant A+T content. The ORFs of each pair are separated by an intervening 
AT-rich and non-coding sequence of 0.9 kb to 1.1 kb. Presence of consensus intron-exon splice junction sequences 

35 at either end of these intervening sequences and lack of a consistent translation start site in the 3' ORF indicate 
that the each pair of ORFs belongs to an individual gene having a two exon structure. This has been verified by 
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comparison of the genomic sequences to the cDNA sequence of an expressed gene \Mar 7\ see subsequent section). 
The three 10 kb to 12 kb regions thus contain members of a variant gene family which have coding regions of 
9.23kb (vaMh 7.99 kb (var-l) and 9.01 kb (var 3l Predicted molecular weights of the encoded proteins are 350 
kD, 302 kD and 344 kD. respectively. 
5 The var genes are flanked by additional members of the var family in Pf YED9. Restriction analysis 

identified two additional genes that are 12-35 kb upstream of the sequenced region and are closely related to var-l 
and var-3(var-2c and Var-3c, Figure 5). The genes thus have a clustered arrangement in which many individual 
members are organized in head-to-tail fashion. Between var l and var2 is a 5 kb DNA sequence that harbors a short 
ORF homologous to that of a repetitive element (rij) suggested to be a transposable element in P. falciparum. 

10 The deduced protein sequences of the var genes are highly diverse, yet all contain certain 

conserved motifs and common structural features. Database searches identified 2 to 4 domains within each var 
sequence that are homologous to cysteine-rich domains of SABP and DABP. In the var sequences, the first domain 
near the amino-terminus (DBL domain 1) is the most conserved of the DBL domains and has amino acid signatures 
that differentiate it from subsequent domains (e.g. consensus peptide sequences GAcApIY/F]rrL, 

15 CTxLARsfadlgdIVrgrdLYLG and VPTYFDYVpqyIrwF). Between DBL domains 1 and 2 is another type of conserved 
domain, a cysteine-rich interdomain region (CIDR) of 300400 amino acids. The CIDR does not have all the motifs 
of a DBL domain, but it does have a region at the 3'end which is homologous to the end of the Fl DBL domain in 
SABP. The conservation evident in the sequences of DBL domain I and the CIDR suggest that these regions maintain 
important structures in the head of the variant molecule. 

2° OBL domains 2, 3 and 4 (numbering is according to var- I the first sequence completed) have 

less discriminating signatures than domain 1, and show features of cross-alignment and variation in number that 
suggest these domains can undergo shuffling and deletion. 

DBL domain 4 is followed by a segment of variable length and a hydrophobic region that is 
encoded at the end of the first exon (exon 1). In all var sequences this hydrophobic region fits the criteria of a 

25 transmembrane segment. The second exon (exon II) encodes a large (45-55 kD) conserved C-terminal sequence that 
has an acid character (predicted pi - 4.5, vs. 5.9 for the part of the protein upstream of the splice junction) and 
a cysteine content of < 1% (vs. > 4% upstream). The position of this C-terminal sequence downstream of a 
single transmembrane segment suggests that it has a cytoplasmic location. 

No consensus signal sequence was detected in the NH2'terminal region of the predicted var GRFs. 

30 We note the presence of several motifs in the protein sequences that are known to act as ligands and receptors in 
the integrin family. These include RGD (var-l codons 886-88, 1992-94) and DGEA (var- / codons 211M4). Not 
all of these motifs occur in each protein sequence and, when they do occur, their positions vary. 

Identification of var transcripts and chromosome expression sites . To identify transcribed var sequences 
we screened a ytgtlO Dd2 cDNA library with i^ar-containing BssWW restriction fragments that had been purified from 

35 PfYED9 and radiolabeled by random hexamer priming. This screening yielded 18 clones with inserts that hybridized 
back to PfYED9. By cross-hybridization studies and DNA sequence analysis the inserts fell into two groups: group 
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I inserts that aligned with sequences of t^arexon I MT240, yiT242, /1T244, AU67, AJ298): 

and group II inserts that aligned with sequences of rar exon II MTMO, >*T141, ><T142, /IT145, A1H7, AlUB, 
/1TI50, >iT152). 

The full ORF of an expressed var gene {var-7) was determined from /IT242 and overlapping cDNA 
5 clones that were obtained by a PCR-based walking strategy. The sequence showed that i^arJ has a 6.6 kb ORF 
containing two DBl domains, a hydrophobic transmembrane sequence and carboxy-termlnal region typical of k^/* genes 
(predicted molecular weight 249 kD). Comparison of i^anJ with the i^ar-J sequence demonstrated continuity of the 
alignments at the predicted splice junction between the ORFs of exons I and II. PGR amplification of Dd2 genomic 
DNA was also performed with primers derived from the two var 7 exons. Sequence of this \^ar7 PGR product 

10 confirmed consensus splice sites and a 1 kb intron typical of the yar genes. Transcription of irar-7 was detected 
as a 7.5 kb band by RNA blot analysis. 

Ghromosome mapping experiments with a raf-7-specific probe localized the var-7 gene to a region 
that is 600 kb from one end of Dd2 chromosome 12 (chromosome 12 has a length of 2600 kb). No hybridization 
of the var-7 probe was detected to any other Dd2 chromosome nor to any chromosomes of the HB3, 3D7 or A4 

15 parasites. Other cDNA inserts from the group I clones were also sequenced and examined for chromosome 
hybridization signals. The >»T240 cONA insert mapped to the yarMyar-2lvar-3 cluster on Dd2 chromosome 7 and 
its sequence matched that of \rar3. The AJ2AA. AT28A. AJ287, AJ26B, A12QS and >^T296 inserts all showed 
overlapping sequences and yielded the same hybridization patterns. Ghromosome sites recognized by these inserts 
included regions within two Sma\ fragments from Dd2 chromosome 7 and another from chromosome 9. We note 

20 that loss of a cytoadherence phenotype has been correlated with a chromosome 9 deletion in certain P, falciparum 
lines. 

1.8 kb to 2. 4 kb RNA transcripts related to var exon II . In addition to the 7.5 kb varH band, a broad 1.8 
kb to 2.4 kb band was detected on RNA blots after hybridization with a probe that recognizes var exon II. 
Sequences of eight group II cDNA inserts homologous to exon It were therefore determined and aligned against the 

25 var genes. Comparative analysis of the insert sequences showed that all differed from one another in regions of 
overlap, indicating that transcription of the corresponding RNAs was from different loci. Three of the cDNA 
sequences (/IT140, ><TI41 and /4TI481 aligned downstream of the intron/exon 11 splice junction. However, five other 
cDNA inserts (/(T142, y1T145, /IT147, y4T150 and /IT152) had sequences that aligned upstream of the var 
intron/exon It splice site and included regions homologous to var intron sequences. In the vicinity of the splice 

30 junction, consensus splice sites occurred in three of the cDNA sequences {ylT142, /1TI47, /iT150) while a fourth 
cDnijonre MT14R) showed the required AG dinucleotide but not the expected pyrimidine tract of the splice consensus. 
The part of the fifth sequence {/tT152) that aligned with the var intron extended upstream only to the TAG of the 
splice sequence. All five sequences lacked a consensus start codon preceded by A+T-rich non-coding DNA that is 
typical of P. falciparum translation start sites. 

35 Isolate-specific var sequences and evidence for ONA recombination in cultivated parasite clones . The 

diversity of var forms expressed by P, falciparum parasites reflects a tremendous repertoire in the var gene family. 
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This repertoire is evident in the patterns of restriction polymorphism detected by var probes as well as in the 
detection of K^r-specific sequences that hybridize to some parasite DNAs but not to others. The var 7 gene 
expressed by Dd2, for example, is not present in the HB3, 3D7 or A4 genomes. Such var diversity suggests that 
frequent DNA rearrangements underlie the production of antigenically variant types in different parasite strains. 

To test for DNA rearrangements in parasites cultivated in vitro, we used var sequences to probe 
restricted DNAs from Dd2 lines adapted to neuraminidase-treated erythrocytes. In one rearrangement a novel 35 
kb Bgl[ fragment is seen in NMl DNA probed with the >iT142 (group II) insert. In another rearrangement a deletion 
of a 20 kb Pst\ band is evident in NIVI8 DNA probed with a var l sequence. Deletion of this 20 kb band was also 
detected in the Dd2/R8 subclone obtained before neuraminidase selection, indicating that the DNA rearrangement was 
not produced by selection in neuraminidase-treated erythrocytes. 

The above examples are provided to illustrate the invention and other variants of the invention 
encompassed by the claims will be readily apparent to one of ordinary skill in the art. 



10 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: The United States, As Represented by the 
Secretary, Department of Health and Human Services 

(ii) TITLE OF INVENTION: BINDING DOMAINS FROM PLASMODIUM VIVAX 
AND PLASMODIUM FALCIPARUM ERYTHROCYTE BINDING PROTEINS 

(iii) NUMBER OF SEQUENCES: 4 5 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Knobbe Martens Olson & Bear 

15 (B) STREET: 620 Newport Center Drive 16th Floor 

(C) CITY: Newport Beach 

(D) STATE: California 

(E) COUNTRY: US 

(F) ZIP: 92660 



20 



25 



35 



40 



50 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

30 (C) CLASSIFICATION: 



(vii) PRIOR APPLICATION DATA 

(A) APPLICATION NUMBER: US08/487826 

(B) FILING DATE: 07-JUN-1996 

(viii) ATTORNEY /AGENT INFORMATION: - . 

(A) NAME: Israelsen, Ned 

(B) REGISTRATION NUMBER: 29,655 

(C) REFERENCE/DOCKET NUMBER: NIH12 1 . 0 0 IQPC 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (619) 235-8550 

(B) TELEFAX: (619) 235-0176 

45 (2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 4084 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

55 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium vivax 

60 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

AAGCTTTTAA AAATAGCAAC AAAATTTCGA AACATTGCCA CAAAAATTTT ATGTTTTACA 6 0 



TATATTTAGA TTCATACAAT TTAGGTGTAC CCTGTTTTTT GATATATGCG CTTAAATTTT 12 0 
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TTTTTCGCTC ATATGTTTAG TTATATGTGT AGAACAACTT GCTGAATAAA TTACGTACAC 18 0 
TTTCTGTTCT GAATAATATT ACCACATACA TTTAATTTTA AATACTATGA AAGGAAAAAA 24 0 
CCGCTCTTTA TTTGTTCTCC TAGTTTTATT ATTGTTACAC AAGGTATCAT ATAAGGATGA 3 00 
TTTTTCTATC ACACTAATAA ATTATCATGA AGGAAAAAAA TATTTAATTA TACTAAAAAG -36 0 
5 AAAATTAGAA AAAGCTAATA ATCGTGATGT TTGCAATTTT TTTCTTCATT TCTCTCAGGT 42 0 
AAATAATGTA TTATTAGAAC GAACAATTGA AACCCTTCTA GAATGCAAAA ATGAATATGT 4 80 
GAAAGGTGAA AATGGTTATA AATTAGCTAA AGGACACCAC TGTGTTGAGG AAGATAACTT 54 0 
AGAACGATGG TTACAAGGAA CCAATGAAAG AAGAAGTGAG GAAAATATAA AATATAAATA 6 00 
TGGAGTAACG GAACTAAAAA TAAAGTATGC GCAAATGAAT GGAAAAAGAA GCAGCCGCAT 660 

10 TTTGAAGGAA TCAATTTACG GGGCGCATAA CTTTGGAGGC AACAGTTACA TGGAGGGAAA 720 
AGATGGAGGA GATAAAACTG GGGAGGAAAA AGATGGAGAA CATAAAACTG ATAGTAAAAC 78 0 
TGATAACGGG AAAGGTGCAA ACAATTTGGT AATGTTAGAT TATGAGACAT CTAGCAATGG 84 0 
CCAGCCAGCG GGAACCCTTG ATAATGTTCT TGAATTTGTG ACTGGGCATG AGGGAAATTC 900 
TCGTAAAAAT TCCTCGAATG GTGGCAATCC TTACGATATT GATCATAAGA AAACGATCTC 96 0 

15 TAGTGCTATT ATAAATCATG CTTTTCTTCA AAATACTGTA ATGAAAAACT GTAATTATAA 1020 
GAGAAAACGT CGGGAAAGAG ATTGGGACTG TAACACTAAG 7UVGGATGTTT GTATACCAGA 1080 
TCGAAGATAT CAATTATGTA TGAAGGAACT TACGAATTTG GTAAATAATA CAGACACT^AA 114 0 
TTTTCATAGG GATATAACAT TTCGAAAATT ATATTTGAAA AGGAAACTTA TTTATGATGC 1200 
TGCAGTAGAG GGCGATTTAT TACTTAAGTT GAATAACTAC AGATATAACA AAGACTTTTG 1260 

20 CAAGGATATA AGATGGAGTT TGGGAGATTT TGGAGATATA ATTATGGGAA CGGATATGGA 1320 
AGGCATCGGA TATTCCAAAG TAGTGGAAAA TAATTTGCGC AGCATCTTTG GAACTGATGA 1380 
AAAGGCCCAA CAGCGTCGTA AACAGTGGTG GAATGAATCT AAAGCACAAA TTTGGACAGC 1440 
AATGATGTAC TCAGTTAAAA AAAGATTAAA GGGGAATTTT ATATGGATTT GTAAATTAAA 1500 
TGTTGCGGTA AATATAGAAC CGCAGATATA TAGATGGATT CGAGAATGGG GAAGGGATTA 1560 

25 CGTGTCAGAA TTGCCCAGAG AAGTGCAAAA ACTGAAAGAA T^AATGTGATG GAAAAATCAA 1620 
TTATACTGAT AAAAAAGTAT GTAAGGTACC ACCATGTCAA AATGCGTGTA AATCATATGA 1680 
TCAATGGATA ACCAGAAAAA AAAATCAATG GGATGTTCTG TCAAATAA?^T TCATAAGTGT 1740 
AAAAAACGCA GAAAAGGTTC AGACGGCAGG TATCGTAACT CCTTATGATA TACTAAAACA 1800 
GGAGTTAGAT GAATTTAACG AGGTGGCTTT TGAGAATGAA ATTAACAAAC GTGATGGTGC 1860 

30 ATATATTGAG TTATGCGTTT GTTCCGTTGA AGAGGCTAAA AAAAATACTC AGGAAGTTGT 192 0 
GACAAATGTG GACAATGCTG CTAAATCTCA GGCCACCAAT TCAAATCCGA TAAGTCAGCC 1980 
TGTAGATAGT AGTAAAGCGG AGAAGGTTCC AGGAGATTCT ACGCATGGAA ATGTTAACAG 2 04 0 
TGGCCAAGAT AGTTCTACCA CAGGTAAAGC TGTTACGGGG GATGGTCAAA ATGGAAATCA 2100 
GACACCTGCA GAAAGCGATG TACAGCGAAG TGATATTGCC GAAAGTGTAA GTGCTAAAAA 216 0 

35 TGTTGATCCG CAGAAATCTG TAAGTAAAAG AAGTGACGAC ACTGCAAGCG TTACAGGTAT 222 0 
TGCCGAAGCT GGAAAGGAAA ACTTAGGCGC ATCAAATAGT CGACCTTCTG AGTCCACCGT 228 0 
TGAAGCAAAT AGCCCAGGTG ATGATACTGT GAACAGTGCA TCTATACCTG TAGTGAGTGG 234 0 
TGAAAACCCA TTGGTAACCC CCTATAATGG TTTGAGGCAT TCGAAAGACA ATAGTGATAG 24 00 
CGATGGACCT GCGGAATCAA TGGCGAATCC TGATTCAAAT AGTAAAGGTG AGACGGGAAA 246 0 

40 GGGGCAAGAT AATGATATGG CGAAGGCTAC TAAAGATAGT AGTAATAGTT CAGATGGTAC 252 0 
. CAGCTCTGCT ACGGGTGATA CTACTGATGC AGTTGATAGG GAAATTAATA AAGGTGTTCC 258 0 
TGAGGATAGG GATAAAACTG TAGGAAGTAA AGATGGAGGG GGGGAAGATA ACTCTGCAAA 2 64 0 
TAAGGATGCA GCGACTGTAG TTGGTGAGGA TAGAATTCGT GAGAACAGCG CTGGTGGTAG 270 0 
CACTAATGAT AGATCAAAAA ATGACACGGA AAAGAACGGG GCCTCTACCC CTGACAGTAA 276 0 

45 ACAAAGTGAG GATGCAACTG CGCTAAGTAA AACCGAAAGT TTAGAATCAA CAGAAAGTGG 282 0 
AGATAGAACT ACTAATGATA CAACTAACAG TTTAGAAAAT AAAAATGGAG GAAAAGAAAA 2 88 0 
GGATTTACAA AAGCATGATT TTAAAAGTAA TGATACGCCG AATGAAGAAC CAAATTCTGA 2 94 0 
TCAAACTACA GATGCAGAAG GACATGACAG GGATAGCATC AAAAATGATA AAGCAGAAAG 3 00 0 
GAGAAAGCAT ATGAATAAAG ATACTTTTAC GAAAAATACA AATAGTCACC ATTTAAATAG 3 06 0 

50 TAATAATAAT TTGAGTAATG GAAAATTAGA TATAAAAGAA TACAAATACA GAGATGTCAA 312 0 
AGCAACAAGG GAAGATATTA TATTAATGTC TTCAGTACGC AAGTGCAACA ATAATATTTC 3180 
TTTAGAGTAC TGTAACTCTG TAGAGGACAA AATATCATCG AATACTTGTT CTAGAGAGAA 324 0 
AAGTAAAAAT TTATGTTGCT CAATATCGGA TTTTTGTTTG AACTATTTTG ACGTGTATTC 3 3 00 
TTATGAGTAT CTTAGCTGCA TGAAAAAGGA ATTTGAAGAT CCATCCTACA AGTGCTTTAC 33 6 0 

55 GAAAGGGGGC TTTAAAGGTA TGCAGAAAAA GATGCTGAAT AGAGAAAGGT GTTGAGTAAA 342 0 
TTAAAAAGGA ATTAATTTTA GGAATGTTAT AAACATTTTT GTACCCAAAA TTCTTTTTGC 34 80 
AGACAAGACT TACTTTGCCG CGGCGGGAGC GTTGCTGATA CTGCTGTTGT TAATTGCTTC 3 54 0 
AAGGAAGATG ATCAAAAATG AGTAACCAGA AAATAAAATA AAATAACATA AAATAAAATA 36 00 
AAAACTAGAA TAACAATTAA AATAAAATAA AATGAGAAAT GCCTGTTAAT GCACAGTTAA 3660 

60 TTCTAACGAT TCCATTTGTG AAGTTTTAAA GAGAGCACAA ATGCATAGTC ATTATGTCCA 3720 
TGCATATATA CACATATATG TACGTATATA TAATAAACGC ACACTTTCTT GTTCGTACAG 3 780 
TTCTGAAGAA GCTACATTTA ATGAGTTTGA AGAATACTGT GATAATATTC ACAGAATCCC 3 84 0 
TCTGATGCCT AACAGTAATT CAAATTTCAA GAGCAAAATT CCATTTAAAA AGAAATGTTA 3 900 
CATCATTTTG CGTTTTTCTT TTTTTCTTTT TTTTTTCTTT TTTAGATATT GAACACATGC 3 960 



30 
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AGCCATCAAC CCCCCTGGAT TATTCATGAT GCTACTTTGG TAAGTAAAAG CAATTCTGAT 402 0 
TGTAGTGCTG ATGTAATTTT AGTCATTTTG CTTGCTGCAA TAAACGAGAA AATATATCAA 408 0 
GCTT 4084 

5 (2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1115 amino acids 

(B) TYPE: amino acid 

10 (C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

15 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium vivax 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Lys Gly Lys Asn Arg Ser Leu Phe Val Leu Leu Val Leu Leu Leu 
1 5 10 15 

Leu His Lys Val Ser Tyr Lys Asp Asp Phe Ser lie Thr Leu lie Asn 
25 20 25 30 

Tyr His Glu Gly Lys Lys Tyr Leu lie lie Leu Lys Arg Lys Leu Glu 

3 5 4 0 4 5 

Lys Ala Asn Asn Arg Asp Val Cys Asn Phe Phe Leu His Phe Ser Gin 

50 55 60 

Val Asn Asn Val Leu Leu Glu Arg Thr lie Glu Thr Leu Leu Glu Cys 
^5 70 75 80 

Lys Asn Glu Tyr Val Lys Gly Glu Asn Gly Tyr Lys Leu Ala Lys Gly 

85 90 95 

His His Cys Val Glu Glu Asp Asn Leu Glu Arg Trp Leu Gin Gly Thr 
35 100 105 110 

Asn Glu Arg. Arg Ser Glu Glu Asn lie Lys Tyr Lys Tyr Gly Val Thr 

115 120 125 

Glu Leu Lys lie Lys Tyr Ala Gin Met Asn Gly Lys Arg Ser Ser Arg 

130 135 140 

lie Leu Lys Glu Ser lie Tyr Gly Ala His Asn Phe Gly Gly Asn Ser 
145 150 155 160 

Tyr Met Glu Gly Lys Asp Gly Gly Asp Lys Thr Gly Glu Glu Lys Asp 

165 170 175 

Gly Glu His Lys Thr Asp Ser Lys Thr Asp Asn Gly Lys Gly Ala Asn 
45 180 185 190 

Asn Leu Val Met Leu Asp Tyr Glu Thr Ser Ser Asn Gly Gin Pro Ala 

195 200 205 

Gly Thr Leu Asp Asn Val Leu Glu Phe Val Thr Gly His Glu Gly Asn 

210 215 220 

Ser Arg Lys Asn Ser Ser Asn Gly Gly Asn Pro Tyr Asp lie Asp His 
225 230 235 240 

Lys Lys Thr lie Ser Ser Ala lie lie Asn His Ala Phe Leu Gin Asn 

245 250 255 

Th-r Val Met Lvs Asn Cys Asn Tyr Lys Arg Lys Arg Arg Glu Arg Asp 
55 2'60 265 270 

Trp Asp Cys Asn Thr Lys Lys Asp Val Cys lie Pro Asp Arg Arg Tyr 

275 280 285 

Gin Leu Cys Met Lys Glu Leu Thr Asn Leu Val Asn Asn Thr Asp Thr 
290 295 300 

60 Asn Phe His Arg Asp lie Thr Phe Arg Lys Leu Tyr Leu Lys Arg Lys 

305 310 315 320 

Leu lie Tyr Asp Ala Ala Val Glu Gly Asp Leu Leu Leu Lys Leu Asn 

325 330 335 

Asn Tyr Arg Tyr Asn Lys Asp Phe Cys Lys Asp lie Arg Trp Ser Leu 
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340 345 350 

Gly Asp Phe Gly Asp lie lie Met Gly Thr Asp Met Glu Gly lie Glv 

355 360 365 

Tyr Ser Lys Val Val Glu Asn Asn Leu Arg Ser lie Phe Glv Thr Aso 

370 375 380 

Glu Lys Ala Gin Gin Arg Arg Lys Gin Trp Trp Asn Glu Ser Lys Ala 
385 390 395 400 

Gin lie Trp Thr Ala Met Met Tyr Ser Val Lys Lys Arg Leu Lys Gly 

405 410 415 

Asn Phe He Trp He Cys Lys Leu Asn Val Ala Val Asn He Glu Pro 

420 425 430 

Gin He Tyr Arg Trp He Arg Glu Trp Gly Arg Asp Tyr Val Ser Glu 

435 440 445 

Leu Pro Thr Glu Val Gin Lys Leu Lys Glu Lys Cys Asp Glv Lvs He 
15 450 455 460 

Asn Tyr Thr Asp Lys Lys Val Cys Lys Val Pro Pro Cys Gin Asn Ala 
465 470 475 480 

Cys Lys Ser Tyr Asp Gin Trp He Thr Arg Lys Lys Asn Gin Trp Asp 

485 490 495 

Val Leu Ser Asn Lys Phe He Ser Val Lys Asn Ala Glu Lys Val Gin 

500 505 510 

Thr Ala Gly He Val Thr Pro Tyr Asp He Leu Lys Gin Glu Leu Asp 

515 520 525 

Glu Phe Asn Glu Val Ala Phe Glu Asn Glu He Asn Lys Arq Asp Glv 
25 530 535 540 ^ y 

Ala Tyr He Glu Leu Cys Val Cys Ser Val Glu Glu Ala Lys Lys Asn 
545 550 555 560 

Thr Gin Glu Val Val Thr Asn Val Asp Asn Ala Ala Lys Ser Gin Ala 

565 570 575 

Thr Asn Ser Asn Pro He Ser Gin Pro Val Asp Ser Ser Lys Ala Glu 

580 585 590 

Lys Val Pro Gly Asp Ser Thr His Gly Asn Val Asn Ser Gly Gin Asp 

595 600 605 

Ser Ser Thr Thr Gly Lys Ala Val Thr Gly Asp Gly Gin Asn Gly Asn 
35 610 615 620 

Gin Thr Pro Ala Glu Ser Asp Val Gin Arg Ser Asp He Ala Glu Ser 
625 630 635 640 

Val Ser Ala Lys Asn Val Asp Pro Gin Lys Ser Val Ser Lys Arg Ser 

645 650 655 

Asp Asp Thr Ala Ser Val Thr Gly He Ala Glu Ala Gly Lys Glu Asn 

660 665 670 

Leu Gly Ala Ser Asn Ser Arg Pro Ser Glu Ser Thr Val Glu Ala Asn 

675 680 685 

Ser Pro Gly Asp Asp Thr Val Asn Ser Ala Ser He Pro Val Val Ser 
45 690 695 700 

Gly Glu Asn Pro Leu Val Thr Pro Tyr Asn Gly Leu Arg His Ser Lys 
705 710 715 720 

Asp Asn Ser Asp Ser Asp Gly Pro Ala Glu Ser Met Ala Asn Pro Asp 

725 730 735 

Ser Asn Ser Lys Gly Glu Thr Gly Lys Gly Gin Asp Asn Asp Met Ala 

740 745 750 

Lys Ala Thr Lys Asp Ser Ser Asn Ser Ser Asp Gly Thr Ser Ser Ala 

755 760 765 

Thr Gly Asp Thr Thr Asp Ala Val Asp Arg Glu He Asn Lys Gly Val 
55 770 775 780 

Pro Glu Asp Arg Asp Lys Thr Val Gly Ser Lys Asp Gly Gly Gly Glu 
785 790 795 800 

Asp Asn Ser Ala Asn Lys Asp Ala Ala Thr Val Val Gly Glu Asp Arg 

805 810 815 

He Arg Glu Asn Ser Ala Gly Gly Ser Thr Asn Asp Arg Ser Lys Asn 

820 825 830 

Asp Thr Glu Lys Asn Gly Ala Ser Thr Pro Asp Ser Lys Gin Ser Glu 

835 840 845 

Asp Ala Thr Ala Leu Ser Lys Thr Glu Ser Leu Glu Ser Thr Glu Ser 
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850 






Asp 


Arg Thr Thr Asn 


8 b 5 




870 


Giy 


Gly 


Lys Glu Lys Asp 






885 


1 LIL 


It JL O 


>isn va±u CjIu Pro 






900 


n X o 


Asp 


Arg Asp ber lie 






one 

915 


Mo f- 


Asn 


Lys Asp Thr Phe 




Q "5 A 

y J 0 




S sr* 


Asn 


Asn Asn Leu Ser , 


94 5 




950 


Tyr 


Arg 


Asp Val Lys Ala 






965 


Val 


Arg 


Lys Cys Asn Asn , 






980 


Glu 


Asp 


Lys He Ser Ser . 






995 


Leu 


Cys 


Cys Ser He Ser . 




1010 


Ser 


Tyr Glu Tyr Leu Ser < 


1025 


1030 



855 860 
hsp Thr Thr Asn Ser Leu Glu Asn Lys Asn 
875 880 
Leu Gin Lys His Asp Phe Lys Ser Asn Asp 

890 895 
hsn Ser Asp Gin Thr Thr Asp Ala Glu Gly 

905 910 
Lys Asn Asp Lys Ala Glu Arg Arg Lys His 

920 925 
rhr Lys Asn Thr Asn Ser His His Leu Asn 
935 940 

\sn Gly Lys Leu Asp He Lys Glu Tyr Lys 
^55 960 
rhr Arg Glu Asp He He Leu Met Ser Ser 

970 975 
\sn He Ser Leu Glu Tyr Cys Asn Ser Val 

985 990 
\sn Thr Cys Ser Arg Glu Lys Ser Lys Asn 

1000 1005 
\sp Phe Cys Leu Asn Tyr Phe Asp Val Tyr 
1015 1020 



1035 1040 

Tyr Lys Cys Phe Thr Lys Gly Gly Phe Lys He Asp Lys Thr Tyr Phe 

1045 1050 1055 

Ala Ala Ala Gly Ala Leu Leu He Leu Leu Leu He Ala Ser Arg Lys 

1060 1065 1070 

Met He Lys Asn Asp Ser Glu Glu Ala Thr Phe Asn Glu Phe Glu Glu 

1075 1080 1085 

Tyr Cys Asp Asn He His Arg He Pro Leu Met Pro Asn Asn He Glu 

1090 1095 1100 

His Met Gin Pro Ser Thr Pro Leu Asp Tyr Ser 
1105 1110 IXIS 

35 (2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4507 base pairs 

(B) TYPE: nucleic acid 
40 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

45 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TATATATATA TATATATATA GATAATAACA TATAAATATA TTCAATGTGC ATACAATGAA 6 0 
ATGTAATATT AGTATATATT TTTTTGCTTC CTTCTTTGTG TTATATTTTG CAAAAGCTAG 12 0 
GAATGAATAT GATATAAAAG AGAATGAAAA AT i"i"i"i AtiAC G i GTATAAAG AAAAAX l TAA X H U 
TGAATTAGAT AAAAAG7VAAT ATGGAAATGT TCAAAAAACT GATAAGAAAA TATTTACTTT 24 0 
TATAGAAAAt AAATTAGATA TTTTAAATAA TTCAAAATTT AATAAAAGAT GGAAGAGTTA 3 00 
TGGAACTCCA GATAATATAG ATAAAAATAT GTCTTTAATA AATAAACATA ATAATGAAGA 3 6 0 
AATGTTTAAC AACAATTATC AATCATTTTT ATCGACAAGT TCATTAATAA AGCAAAATAA 4 2 0 
ATATGTTCCT ATTAACGCTG TACGTGTGTC TAGGATATTA AGTTTCCTGG ATTCTAGAAT 4 8 0 
TAATAATGGA AGAAATACTT CATCTAATAA CGAAGTTTTA AGTAATTGTA GGGAAAAAAG 54 0 
GAAAGGAATG AAATGGGATT GTAAAAAGAA AAATGATAGA AGCAACTATG TATGTATTCC 6 0 0 
TGATCGTAGA ATCCAATTAT GCATTGTTAA TCTTAGCATT ATTAAAACAT ATACAAAAGA 6 6 0 
GACCATGAAG GATCATTTCA TTGAAGCCTC TAAAAAAGAA TCTCAACTTT TGCTTAAAAA 72 0 
AAATGATAAC AAATATAATT CTAAATTTTG TAATGATTTG AAGAATAGTT TTTTAGATTA 780 
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TGGACATCTT GCTATGGGAA ATGATATGGA TTTTGGAGGT TATTCAACTA AGGCAGAAAA 84 0 
CAAAATTCAA GAAGTTTTTA AAGGGGCTCA TGGGGAAATA AGTGAACATA AAATTAAAAA 9 00 
TTTTAGAAAA GAATGGTGGA ATGAATTTAG AGAGAAACTT TGGGAAGCTA TGTTATCTGA 96 0 
GCATAAAAAT AATATAAATA ATTGTAAAAA TATTCCCCAA GAAGAATTAC AAATTACTCA 1020 
ATGGATAAT^ GAATGGCATG GAGAATTTTT GCTTGAAAGA GATAATAGAT CAAAATTGCC 1080 
AAAAAGTAAA TGTAAAAATA ATACATTATA TGAAGCATGT GAGAAGGAAT GTATTGATCC 114 0 
ATGTATGAAA TATAGAGATT GGATTATTAG AAGTAAATTT GAATGGCATA CGTTATCGAA 1200 
AGAATATGAA ACTCAAAAAG TTCCAAAGGA AAATGCGGAA AATTATTTAA TCAAAATTTC 1260 
AGAAAACAAG AATGATGCTA AAGTAAGTTT ATTATTGAAT AATTGTGATG CTGAATATTC 1320 
AAAATATTGT GATTGTAAAC ATACTACTAC TCTCGTTAAA AGCGTTTTAA ATGGTAACGA 13 80 
CAATACAATT AAGGAAAAGC GTGAACATAT TGATTTAGAT GATTTTTCTA AATTTGGATG 144 0 
TGATAAAAAT TCCGTTGATA CAAACACAAA GGTGTGGGAA TGTAAAAACC CTTATATATT 1500 
ATCCACTAAA GATGTATGTG TACCTCCGAG GAGGCAAGAA TTATGTCTTG GAAACATTGA 1560 
TAGAATATAC GATAAAAACC TATTAATGAT AAAAGAGCAT ATTCTTGCTA TTGCAATATA 162 0 
15 TGAATCAAGA ATATTGAAAC GAAAATATAA GAATAAAGAT GATAAAGAAG TTTGTAAAAT 168 0 
CATAAATAAA ACTTTCGCTG ATATAAGAGA TATTATAGGA GGTACTGATT ATTGGAATGA 174 0 
TTTGAGCAAT AGAAAATTAG TAGGAAAAAT TAACACAAAT TCAAAATATG TTCACAGGAA 1800 
T7UVAAAAAAT GATAAGCTTT TTCGTGATGA GTGGTGGAAA GTTATTAAAA AAGATGTATG 1860 
GAATGTGATA TCATGGGTAT TCAAGGATAA AACTGTTTGT AAAGTU^GATG ATATTGAAAA 1920 
TATACCACAA TTCTTCAGAT GGTTTAGTGA ATGGGGTGAT GATTATTGCC AGGATAAAAC 1980 
AAAAATGATA GAGACTCTGA AGGTTGAATG CAAAGAAAAA CCTTGTGAAG ATGACAATTG 204 0 
TAAAAGTAAA TGTAATTCAT ATAAAGAATG GATATCAAAA AAAAAAGAAG AGTATT^TAA 2100 
ACAAGCCAAA CAATACCAAG AATATCAAAA AGGAAATAAT TACAAAATGT ATTCTGAATT 2160 
TAAATCTATA AAACCAGAAG TTTATTTAAA GAAATACTCG GAAAAATGTT CTAACCTAAA 2220 
TTTCGAAGAT GAATTTAAGG AAGAATTACA TTCAGATTAT AAAAATAAAT GTACGATGTG 2280 
TCCAGAAGTA AAGGATGTAC CAATTTCTAT AATAAGAAAT AATGAACAAA CTTCGCAAGA 234 0 
AGCAGTTCCT GAGGAAAACA CTGAAATAGC ACACAGAACG GAAACTCCAT CTATCTCTGA 24 00 
AGGACCAAAA GGAAATGAAC AAAAAGAACG TGATGACGAT AGTTTGAGTA AAATAAGTGT 2460 
ATCACCAGAA AATTCAAGAC CTGAAACTGA TGCTAAAGAT ACTTCTAACT TGTTAAAATT 2520 
AAAAGGAGAT GTTGATATTA GTATGCCTAA AGCAGTTATT GGGAGCAGTC CTAATGATAA 2580 
TATAAATGTT ACTGAACAAG GGGATAATAT TTCCGGGGTG AATTCTAAAC CTTTATCTGA 264 0 
TGATGTACGT CCAGATAAAA AGGAATTAGA AGATCAAAAT AGTGATGAAT CGGAAGAAAC 2700 
TGTAGTAAAT CATATATCAA AAAGTCCATC TATAAATAAT GGAGATGATT CAGGCAGTGG 2760 
AAGTGCAACA GTGAGTGAAT CTAGTAGTTC AAATACTGGA TTGTCTATTG ATGATGATAG 2820 
35 AAATGGTGAT ACATTTGTTC GAACACAAGA TACAGCAAAT ACTGAAGATG TTATTAGAAA 2880 
AGAAAATGCT GACAAGGATG AAGATGA/^ AGGCGCAGAT GAAGAAAGAC ATAGTACTTC 2 94 0 
TGAAAGCTTA AGTTCACCTG AAGAAAAAAT GTTAACTGAT AATGAAGGAG GAAATAGTTT 3 000 
AAATCATGAA GAGGTGAAAG AACATACTAG TAATTCTGAT AATGTTCAAC AGTCTGGAGG 3 060 
AATTGTTAAT ATGAATGTTG AGAAAGAACT AAAAGATACT TTAGAAAATC CTTCTAGTAG 312 0 
40 CTTGGATGAA GGAAAAGCAC ATGAAGAATT ATCAGAACCA AATCTAAGCA GTGACCAAGA 318 0 
- TATGTCTAAT ACACCTGGAC CTTTGGATAA CACCAGTGAA GAAACTACAG AAAGAATTAG 324 0 
TAATAATGAA TATAAAGTTA ACGAGAGGGA AGATGAGAGA ACGCTTACTA AGGAATATGA 3 3 00 
AGATATTGTT TTGAAAAGTC ATATGAATAG AGAATCAGAC GATGGTGAAT TATATGACGA 3 3 60 
TW^TTCAGAC TTATCTACTG TAAATGATGA ATCAGAAGAC GCTGAAGCAA AAATGAAAGG 342 0 
45 AAATGATACA TCTGAAATGT CGCATAATAG TAGTCAACAT ATTGAGAGTG ATCAACAGAA 34 80 
AAACGATATG AAAACTGTTG GTGATTTGGG AACCACACAT GTACAAAACG AAATTAGTGT 3 54 0 
TCCTGTTACA GGAGAAATTG ATGAAAAATT AAGGGAAAGT AAAGAATCAA AAATTCATAA 3 60 0 
GGCTGAAGAG GAAAGATTAA GTCATACAGA TATACATAAA ATTT^TCCTG AAGATAGAAA 3 66 0 
TAGTAATACA TTACATTTAA AAGATATAAG AAATGAGGAA AACGAAAGAC ACTTAACTAA 372 0 
TCAAAACATT AATATTAGTC AAGAAAGGGA TTTGCAAAAA CATGGATTCC ATACCATGAA 378 0 
TAATCTACAT GGAGATGGAG TTTCCGAAAG AAGTCAAATT AATCATAGTC ATCATGGAAA 3 84 0 
CAGACAAGAT CGGGGGGGAA ATTCTGGGAA TGTTTTAAAT ATGAGATCTA ATAATAATAA 3 90 0 
TTTTAATAAT ATTCCAAGTA GATATAATTT ATATGATAAA AAATTAGATT TAGATCTTTA 3 960 
TGAAAACAGA AATGATAGTA CAACAAAAGA ATTAATAAAG AAATTAGCAG AAATAAATAA 4020 
ATGTGAGAAC GAAATTTCTG TAAAATATTG TGACCATATG ATTCATGAAG AAATCCCATT 4 080 
AAAAACATGC ACTAAAGAAA AAACAAGAAA TCTGTGTTGT GCAGTATCAG ATTACTGTAT 414 0 
GAGCTATTTT ACATATGATT CAGAGGAATA TTATAATTGT ACGAAAAGGG AATTTGATGA 4200 
TCCATCTTAT ACATGTTTCA GAAAGGAGGC TTTTTCAAGT ATGATATTCA AATTTTTAAT 426 0 
AACAAATAAA ATATATTATT ATTTTTATAC TTACAAAACT GCAAAAGTAA CAATAAAAAA 432 0 
AATTAATTTC TCATTAATTT tTTTTTTCTT tTTTTCTTTT TAGGTATGCC ATATTATGCA 43 8 0 
GGAGCAGGTG TGTTATTTAT TATATTGGTT ATTTTAGGTG CTTCACAAGC CAAATATCAA 444 0 
AGGTTAGAAA AAATAAATAA AAATAAAATT GAGAAGAATG TAAATTT^T ATAGAATTCG 450 0 
AGCTCGG 4507 
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(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1435 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 



Met 


Lys 


Cys 


Asn 


He 


Ser 


He 


Tyr 


Phe 


Phe 


Ala 


Ser 


Phe 


Phe 


Val 


Leu 


1 


Phe 


Ala 




5 










10 










15 




Tyr 


Lys 


Ala 


Arg 


Asn 


Glu 


Tyr 


Asp 


He 


Lvs 


Glu 


Asn 


Glu 




Phe 






20 










25 










30 




Leu 


Asp 


Val 


Tyr 


Lys 


Glu 


Lys 


Phe 


Asn 


Glu 


Leu 


Asp 


Lvs 


Lys 


Lys 






35 










40 










45 


Tyr 


Gly 
50 


Asn 


Val 


Gin 


Lys 


Thr 
55 


Asp 


Lys 


Lys 


He 


Phe 
60 


Thr 


Phe 


He 


Glu 


Asn 
65 


Lys 


Leu 


Asp 


He 


Leu 
70 


Asn 


Asn 


Ser 


Lys 


Phe 
75 


Asn 


Lvs 


Arci 


Tm 


Lys 
80 


Ser 


Tyr 


Gly 


Thr 


Pro 


Asp 


Asn 


He 


Asp 


Lys 


Asn 


Met 


Ser 


Leu 


He 


Asn 




His 






85 










90 










95 




Lys 


Asn 


Asn 


Glu 


Glu 


Met 


Phe 


Asn 


Asn 


Asn 


Tyr 


Gin 


Ser 


Phe 


Leu 


Ser 


Thr 




100 










105 








110 






Ser 


Ser 


Leu 


He 


Lys 


Gin 


Asn 


Lys 


Tyr 


Val 


Pro 


He 


Asn 


Ala 


Val 




115 










120 








125 








Arg 


Val 


Ser 


Arg 


He 


Leu 


Ser 


Phe 


Leu 


Asp 


Ser 


Arg 


He 


Asn 


Asn 


Gly 


130 










135 








140 








Arg 


Asn 


Thr 


Ser 


Ser 


Asn 


Asn 


Glu 


Val 


Leu 


Ser 


Asn 


Cvs 


ArcT 


Glu 


145 










150 










155 






160 


Lys 


Arg 


Lys 


Gly 


Met 


Lys 


Trp 


Asp 


Cys 


Lys 


Lys 


Lys 


Asn 


Aso 


Arcr 


Ser 


Asn 








165 










170 








175 




Tyr 


Val 


Cys 


He 


Pro 


Asp 


Arg 


Arg 


He 


Gin 


Leu 


Cys 


He 


Val 


Asn 








180 










185 








190 






Leu 


Ser 


He 


He 


Lys 


Thr 


Tyr 


Thr 


Lys 


Glu 


Thr 


Met 


Lys 


Asp 


His 


Phe 


lie 




195 










200 










205 






Glu 


Ala 


Ser 


Lys 


Lys 


Glu 


Ser 


Gin 


Leu 


Leu 


Leu 


Lys 


Lys 


Asn 


Asp 




210 










215 










220 






Asn 


Lys 


Tyr 


Asn 


Ser 


Lys 


Phe 


Cys 


Asn 


Asp 


Leu 


Lys 


Asn 


Ser 


Phe 


Leu 


225 










230 










235 








240 


Asp 


Tyr 


Gly 


His 


Leu 


Ala 


Met 


Gly 


Asn 


Asp 


Met 


Asp 


Phe 


Gly 


Gly 


Tyr 










245 










250 










255 


Ser 


Thr 


Lys 


Ala 


Glu 


Asn 


Lys 


He 


Gin 


Glu 


Val 


Phe 


Lys 


Gly 


Ala 


His 


Gly 






260 










265 








270 






Glu 


He 


Ser 


Glu 


His 


Lys 


He 


Lys 


Asn 


Phe 


Arg 


Lys 


Glu 


Trp 


Trp 






275 










280 










285 




As" 


Glu 


Phe 


AT"g 


01 11 


Lys 


Leu 


Trp 


Glu 


Ala 


Met 


Leu 


Ser 


Glu 


His 


Lys 




290 










295 










300 








Asn 


Asn 


He 


Asn 


Asn 


Cys 


Lys 


Asn 


He 


Pro 


Gin 


Glu 


Glu 


Leu 


Gin 


He 


305 










310 










315 










320 


Thr 


Gin 


Trp 


He 


Lys 


Glu 


Trp 


His 


Gly 


Glu 


Phe 


Leu 


Leu 


Glu 


Arg 


Asp 










325 










330 










335 


Asn 


Arg 


Ser 


Lys 


Leu 


Pro 


Lys 


Ser 


Lys 


Cys 


Lys 


Asn 


Asn 


Thr 


Leu 


Tyr 








340 










345 










350 




Glu 


Ala . 


Cys 


Glu 


Lys 


Glu 


Cys 


He 


Asp 


Pro 


Cys 


Met 


Lys 


Tyr 


Arg 


Asp 






355 










360 










365 


Trp 


He 


He 


Arg 


Ser 


Lys 


Phe 


Glu 


Trp 


His 


Thr 


Leu 


Ser 


Lys 


Glu 


Tyr 
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Glu 
385 
He 

5 

Cys 

Leu 

10 Arg 

Asn 
465 
He 

15 

Cys 
Lys 

20 Arg 

Lys 
545 
Asn 

25 

Lys 

Trp 

30 Phe 

Gin 
625 
Lys 

35 

Cys 
He 

40 Glu 

He 
705 
Leu 

45 

Asn 
He 

50 Thr 

Lys 
785 
Ser 

55 

Ser 

Ala 

60 Gly 

Arg 
865 
Glu 



370 

Thr Gin Lys 

Ser Glu Asn 

Asp Ala Glu 
420 

Val Lys Ser 

435 
Glu His He 
450 

Ser Val Asp 

Leu Ser Thr 

Leu Gly Asn 
500 

Glu His He 

515 
Lys Tyr Lys 
530 

Thr Phe Ala 

Asp Leu Ser 

Tyr Val His 
580 

Trp Lys Val 

595 
Lys Asp Lys 
610 

Phe Phe Arg 

Thr Lys Met 

Glu Asp Asp 
660 

Ser Lys Lys 
675 

Tyr Gin Lys 
690 

Lys Pro Glu 

Asn Phe Glu 

Lys Cys Thr 
740 

Arg Asn Asn 

755 
Glu He Ala 
770 

Gly Asn Glu 

Val Ser Pro 

Asn Leu Leu 
820 

Val He Gly 

835 
Asp Asn He 
850 

Pro Asp Lys 
Thr Val Val 



375 

Val Pro Lys 

390 
Lys Asn Asp 
405 

Tyr Ser Lys 

Val Leu Asn 

Asp Leu Asp 
455 

Thr Asn Thr 

470 
Lys Asp Val 
485 

He Asp Arg 

Leu Ala He 

Asn Lys Asp 
535 

Asp He Arg 
550 

Asn Arg Lys 
565 

Arg Asn Lys 

He Lys Lys 

Thr Val Cys 
615 

Trp Phe Ser 
630 

He Glu Thr 
645 

Asn Cys Lys 

Lys Glu Glu 

Gly Asn Asn 
695 

Val Tyr Leu 
710 

Asp Glu Phe 
725 

Met Cys Pro 

Glu Gin Thr 

His Arg Thr 
775 

Gin Lys Glu 
790 

Glu Asn Ser 
805 

Lys Leu Lys 

Ser Ser Pro 

Ser Gly Val 
855 

Lys Glu Leu 
870 

Asn His He 



Glu Asn Ala 

Ala Lys Val 
410 

Tyr Cys Asp 

425 
Gly Asn Asp 
440 

Asp Phe Ser 

Lys Val Trp 

Cys Val Pro 
490 

He Tyr Asp 
505 

Ala He Tyr 
520 

Asp Lys Glu 

Asp He He 

Leu Val Gly 
570 

Lys Asn Asp 
585 

Asp Val Trp 
600 

Lys Glu Asp 

Glu Trp Gly 

Leu Lys Val 
650 

Ser Lys Cys 
665 

Tyr Asn Lys 
680 

Tyr Lys Met 

Lys Lys Tyr 

Lys Glu Glu 
730 

Glu Val Lys 
745 

Ser Gin Glu 
760 

Glu Thr Pro 

Arg Asp Asp 

Arg Pro Glu 
810 

Gly Asp Val 
825 

Asn Asp Asn 
840 

Asn Ser Lys 
Glu Asp Gin 
Ser Lys Ser 



380 
Glu Asn Tyr 
395 

Ser Leu Leu 

Cys Lys His 

Asn Thr He 
445 

Lys Phe Gly 

460 
Glu Cys Lys 
475 

Pro Arg Arg 

Lys Asn Leu 

Glu Ser Arg 
525 

Val Cys Lys 

540 
Gly Gly Thr 
555 

Lys He Asn 

Lys Leu Phe 

Asn Val He 
605 

Asp He Glu 

620 
Asp Asp Tyr 
635 

Glu Cys Lys 

Asn Ser Tyr 

Gin Ala Lys 
685 

Tyr Ser Glu 

700 
Ser Glu Lys 
715 

Leu His Ser 

Asp Val Pro 

Ala Val Pro 
765 

Ser He Ser 

780 
Asp Ser Leu 
795 

Thr Asp Ala 

Asp He Ser 

He Asn Val 
845 

Pro Leu Ser 

860 
Asn Ser Asp 
875 

Pro Ser He 



Leu He Lys 
400 

Leu Asn Asn- 

415 
Thr Thr Thr 
430 

Lys Glu Lys 

Cys Asp Lys 

Asn Pro Tyr 
480 

Gin Glu Leu 

495 
Leu Met He 
510 

He Leu Lys 

He He Asn 

Asp Tyr Trp 
560 

Thr Asn Ser 

575 
Arg Asp Glu 
590 

Ser Trp Val 

Asn He Pro 

Cys Gin Asp 
640 

Glu Lys Pro 

655 
Lys Glu Trp 
670 

Gin Tyr Gin 

Phe Lys Ser 

Cys Ser Asn 
720 

Asp Tyr Lys 

735 
He Ser He 
750 

Glu Glu Asn 

Glu Gly Pro 

Ser Lys He 
800 

Lys Asp Thr 

815 
Met Pro Lys 
830 

Thr Glu Gin 

Asp Asp Val 

Glu Ser Glu 
880 

Asn Asn Gly 



wo 96/40766 



•39- 



PCT/US96/09508 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 











8 8 5 










Q o n 

ti y u 










895 




Asp 


Asp 

XT 


Ser 


Glv 


Ser 


Glv 


Ser 


Ala 


Thr 


V CL X 


Ofc: XT 


vjXU 


Ser 


Ser 


Ser 


Ser 




Thr 




900 










one 
y u D 










910 


Asn 


Glv 


Leu 


Ser 


He 


Asp 






Arg 


Asn 


Gly 


Asp 


Thr 


Phe 


Val 






915 










Q o n 

7 ^ U 








92 5 










Thr 


Gin 


Asp 


Thr 


Ala 




Thy 


VjXU 


Asp 


V ai 


lie 


Arg 


Lys 


Glu 


Asn 


Ala 


93 0 










935 










940 






ASD 


Lvs 


Asp 


Glu 


Asp 


Glu 


Lys 


J. y 


a 


Asp 


Glu 


Glu 


Arg 


His 


Ser 


945 










950 










955 








960 


Thr 


Ser 


Glu 


Ser 


T 1 


Ser 


Ser 


Pro 


± u 


CjXU 


Lys 


Met 


Leu 


Thr 


Asp 


Asn 


Glu 








^ \j ~j 










OTA 










975 




Gly 


Glv 


Asn 


Ser 


Leu 


Asn 


His 




oXU 


Val 


Lys 


Glu 


His 


Thr 


Ser 








980 










985 








990 






Asn 


Ser 


Asp 


Asn 


Val 


Gin 


Gin 


Ser 


Gly 


Gly 


He 


Val 


Asn 


Met 


Asn 


Val 


Glu 




995 








1000 








1005 






Lys 


Glu 


Leu 


Lys 


Asp 


Thr 


Leu 


Glu 


Asn 


Pro 


Ser 


Ser 


Ser 


Leu 


Asp 


1010 








1015 








1020 








Glu 


Gly 


Lys 


Ala 


His 


Glu 


Glu 


Leu 


Ser 


Glu 


Pro 


Asn 


Leu 


Ser 


Ser 


Asp 


1025 








1030 








1035 








1040 


Gin 


Asp 


Met 


Ser 


Asn 


Thr 


Pro 


Gly 


Pro 


Leu 


Asp 


Asn 


Thr 


Ser 


Glu 


Glu 


Thr 






1045 








1050 








1055 




Thr 


Glu 


Arg 


He 


Ser 


Asn 


Asn 


Glu 


Tyr 


Lys 


Val 


Asn 


Glu 


Arg 


Glu 






1060 








1065 








1070 




Asp 


Glu 


Arg 


Thr 


Leu 


Thr 


Lys 


Glu 


Tyr 


Glu 


Asp 


He 


Val 


Leu 


Lys 


Ser 


His 


1075 








1080 






1085 






Met 


Asn 


Arg 


Glu 


Ser 


Asp 


Asp 


Gly 


Glu 


Leu 


Tyr 


Asp 


Glu 


Asn 


Ser 


1090 








1095 








1100 








Asp 


Leu 


Ser 


Thr 


Val 


Asn 


Asp 


Glu 


Ser 


Glu 


Asp 


Ala 


Glu 


Ala 


Lys 


Met 


1105 . 








1110 








1115 






1120 


Lys 


Gly 


Asn 


Asp 


Thr 


Ser 


Glu 


Met 


Ser 


His 


Asn 


Ser 


Ser 


Gin 


His 


He 


Glu 






1125 








1130 








1135 


Ser 


Asp 


Gin 


Gin 


Lys 


Asn 


Asp Met 


Lys 


Thr 


Val 


Gly 

] 


Asp 


Leu 


Gly 






1140 








1145 








L150 




Thr 


Thr 


His 


Val 


Gin 


Asn 


Glu 


He 


Ser 


Val 


Pro 


Val 


Thr 


Gly 


Glu 


He 




1155 








1160 








1165 






Asp 


Glu 


Lys 


Leu 


Arg 


Glu 


Ser- 


Lys 


Glu 


Ser 


Lys 


He 


His 


Lys 


Ala 


Glu 


1170 








1175 








1180 








Glu 


Glu 


Arg 


Leu 


Ser 


His 


Thr 


Asp 


He 


His 


Lys 


He 


Asn 


Pro 


Glu 


Asp 


1185 








1190 








1195 








1200 


Arg 


Asn 


Ser 


Asn 


Thr 


Leu 


His 


Leu 


Lys 


Asp 


He 


Arg 


Asn 


Glu 


Glu 


Asn 


Glu 






1205 








1210 






1215 




Arg 


His 


Leu 


Thr 


Asn 


Gin 


Asn 


He 


Asn 


He 


Ser 


Gin 


Glu 


Arg 


Asp 






1220 








1225 








1230 


Leu 


Gin 


Lys 


His 


Gly 


Phe 


His 


Thr 


Met 


Asn 


Asn 


Leu 


His 


Gly 


Asp 


Gly 




1235 








1240 








1245 


Val 


Ser 


Glu 


Arg 


Ser 


Gin 


He 


Asn 


His 


Ser 


His 


His 


Gly 


Asn 


Arg 


Gin 


1250 








1255 








1260 






Asp 


Arg 


Gly 


Gly 


Asn 


Ser 


Gly Asn 


Val 


Leu 


Asn 


Met 


Arg 


Ser 


Asn 


Asn 


1265 








1270 








1275 






1280 


Asn 


Asn 


Phe 


Asn 


Asn 


He 


Pro 


Ser 


Arg 


Tyr 


Asn* 


Leu 


Tyr 


Asp 


Lys 


Lys 








1285 








1290 








L295 


Leu 


Asp 


Leu 


Asp 


Leu 


Tyr 


Glu 


Asn 


Arg 


Asn 


Asp 


Ser 


Thr 


Thr 


Lys 








1300 








1305 








1310 






•r -I _ 
-U -L ^ 


T _ _ ^ 

o-i jr lis 


T , ....^ 




Ala 


Glu 


He 




T .\rc; 
— J — 


~ J — 


Glu 


Asn 


GIri 








1315 








1320 








1325 








Val 


Lys 


Tyr 


Cys 


Asp 


His 


Met 


He 


His 


Glu 


Glu 


He 


Pro 


Leu 


Lys 


Thr 


1330 








1335 








1340 








Cys 


Thr 


Lys 


Glu 


Lys 


Thr 


Arg 


Asn 


Leu 


Cys 


Cys 


Ala 


Val 


Ser 


Asp 


Tyr 


1345 








1350 








1355 






1360 


Cys 


Met 


Ser 


Tyr 


Phe 


Thr 


Tyr 


Asp 


Ser 


Glu 


Glu 


Tyr 


Tyr 


Asn 


Cys 


Thr 








1365 








1370 








1375 




Lys 


Arg 


Glu 


Phe 


Asp 


Asp 


Pro 


Ser 


Tyr 


Thr 


Cys 


Phe 


Arg 


Lys 


Glu 


Ala 






1380 








1385 








.390 






Phe 


Ser 


Ser 


Met 


He 


Phe 


Lys 


Phe 


Leu 


He 


Thr 


Asn 


Lys 


He 


Tyr 


Tyr 
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20 



25 



30 



40 



1395 1400 1405 

Tyr Phe Tyr Thr Tyr Lys Thr Ala Lys Val Thr He Lys Lys He Asn 

1410 1415 1420 

Phe Ser Leu He Phe Phe Phe Phe Phe Ser Phe - - 

1425 1430 1435 



(2) INFORMATION FOR SEQ ID NO : 5 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2288 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

{ ii i ) HYPOTHETICAL : NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



CACTTTATGC TTCCGGCTCG TATGTTGTGT GGAATTGTGA GCGGATAACA ATTTCACACA 6 0 
GGAAACAGCT ATGACCATGA TTACGCCAAG CTCTAATACG ACTCACTATA GGGAAAGCTG 12 0 
GTACGCCTGC AGGTCCGGTC CGGAATTCAA TAAAATATTT CCAGAAAGGA ATGTGCAAAT 18 0 
TCACATATCC AATATATTCA AGGAATATAA AGAAAATAAT GTAGATATCA TATTTGGAAC 24 0 
GTTGAATTAT GAATATAATA ATTTCTGTAA AGAAAAACCT GAATTAGTAT CTGCTGCCAA 3 0 0 
GTATAATCTG AAAGCTCCAA ATGCTA7VATC CCCTAGAATA TACAAATCTA AGGAGCATGA 3 6 0 
AGAATCAAGT GTGTTTGGTT GCAAAACGAA AATCAGTAAA GTTAAAAAAA AATGGAATTG 42 0 
TTATAGTAAT AATAT^GTAA CTAAACCTGA AGGTGTATGT GGACCACCAA GAAGGCAACA 4 8 0 
ATTATGTCTT GGATATATAT TTTTGATTCG CGACGGTAAC GAGGAAGGAT TAAAAGATCA 54 0 
35 TATTAATAAG GCAGCTAATT ATGAGGCAAT GCATTTAAAA GAGAAATATG AGAATGCTGG 6 0 0 
TGGTGATAAA ATTTGCAATG CTATATTGGG AAGTTATGCA GATATTGGAG ATATTGTAAG 6 6 0 
AGGTTTGGAT GTTTGGAGGG ATATAAATAC TAATAAATTA TCAGAAAAAT TCCAAAAAAT 72 0 
TTTTATGGGT GGTGGTAATT CTAGGAAAAA ACAA7VACGAT AATAATGAAC GTAATAAATG 78 0 
GTGGGAAAAA CAAAGGAATT TAATATGGTC TAGTATGGTA AAACACATTC CAAAAGGAAA 84 0 
AACATGTAAA CGTCATAATA ATTTTGAGAA AATTCCTCAA TTTTTGAGAT GGTTAAAAGA 9 0 0 
ATGGGGTGAT GAATTTTGTG AGGAAATGGG TACGGAAGTC AAGCAATTAG AGAAAATATG 96 0 
TGAAAATAAA AATTGTTCGG AAAAAAAATG TAAAAATGCA TGTAGTTCCT ATGAAAAATG 102 0 
GATAAAGGAA CGAAAAAATG AATATAATTT GCAATCAAAG AAATTTGATA GTGATAAAAA 1080 
ATTAAATAAA AAAAACAATC TTTATAATAA ATTTGAGGAT TCTAAAGCTT ATTTAAGGAG 114 0 
45 TGAATCAAAA CAGTGCTCAA ATATAGAATT TAATGATGAA ACATTTACAT TTCCTAATAA 12 00 
ATATAAAGAG GCTTGTATGG TATGTGAAAA TCCTTCATCT TCGAAAGCTC TTAAACCTAT 126 0 
AAAAACGAAT GTGTTTCCTA TAGAGGAATC AAAAAAATCT GAGTTATCAA GTTTAACAGA 132 0 
TAAATCTAAG AATACTCCTA ATAGTTCTGG TGGGGGAAAT TATGGAGATA GACAAATATC 13 80 
AAAAAGAGAC GATGTTCATC ATGATGGTCC TAAGGAAGTG AAATCCGGAG AAAAAGAGGT 144 0 
50 ACCAAAAATA GATGCAGCTG TTAAAACAGA AAATGAATTT ACCTCTAATC GAAACGATAT 1500 
TGAAGGAAAG GAAAAAAGTA AAGGTGATCA TTCTTCTCCT GTTCATTCTA AAGATATAAA 1560 
AAATGAGGAA CCACAAAGGG TGGTGTCTGA AAATTTACCT AAAATTGAAG AGAAAATGGA 1620 
ATCTTCTGAT TCTATACCAA TTACTCATAT AGAAGCTGAA AAGGGTCAGT CTTCTAATTC 1680 
TAGCGATAAT GATCCTGCAG TAGTAAGTGG TAGAGAATCT AAAGATGTAA ATCTTCATAC 1740 
55 TTCTGAAAGG ATTAAAGAAA ATGAAGAAGG TGTGATTAAA ACAGATGATA GTTCAAAAAG 1800 
TATTGAAATT TCTAAAATAC CATCTGACCA AAAT7UVTCAT AGTGATTTAT CACAGAATGC 1860 
AAATGAGGAC TCTAATCAAG GGAATAAGGA AACAATAAAT CCTCCTTCTA CAGAAAAAAA 192 0 
TCTCAAAGAA ATTCATTATA AAACATCTGA TTCTGATGAT CATGGTTCTA AAATTAAAAG 1980 
TGAAATTGAA CCAAAGGAGT TAACGGAGGA ATCACCTCTT ACTGATAAAA AAACTGAAAG 204 0 
60 TGCAGCGATT GGTGATAAAA ATCATGAATC. AGTAAAAAGC GCTGATATTT TTCAATCTGA 2100 
GATTCATAAT TCTGATAATA GAGATAGAAT TGTTTCTGAA AGTGTAGTTC AGGATTCTTC 216 0 
AGGAAGCTCT ATGAGTACTG AATCTATACG TACTGATAAC AAGGATTTTA AAACAAGTGA 222 0 
GGATATTGCA CCTTCTATTA ATGGTCGGAA TTCCCGGGTC GACGAGCTCA CTAGTCGGCG 228 0 
GCCGCTCT 22 8 8 



10 



15 
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41. 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 74 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 



20 



30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Ala Asp Asn Asn Phe Thr Gin Glu Thr Ala Met Thr Met lie Thr Pro 
15 10 15 

Ser Ser Asn Thr Thr His Tyr Arg Glu Ser Trp Tyr Ala Cys Aro Ser 

20 25 30 

Gly Pro Glu Phe Asn Lys He Phe Pro Glu Arg Asn Val Gin He His 
35 40 45 

„ lie Ser Asn He Phe Lys Glu Tyr Lys Glu Asn Asn Val Asp He He 

25 50 55 60 

Phe Gly Thr Leu Asn Tyr Glu Tyr Asn Asn Phe Cys Lys Glu Lys Pro 

^5 70 75 ^ 

Glu Leu. Val Ser Ala Ala Lys Tyr Asn Leu Lys Ala Pro Asn Ala Lys 

85 90 95 

Ser Pro Arg He Tyr Lys Ser Lys Glu His Glu Glu Ser Ser Val Phe 

100 105 110 

Gly Cys Lys Thr Lys He Ser Lys Val Lys Lys Lys Trp Asn Cvs Tvr 

115 120 125 

Ser Asn Asn Lys Val Thr Lys Pro Glu Gly Val Cys Gly Pro Pro Ara 
35 130 135 140 

Arg Gin Gin Leu Cys Leu Gly Tyr He Phe Leu He Arg Asp Gly Asn 
145 150 " 155 - 160 

Glu Glu Gly Leu Lys Asp His He Asn Lys Ala Ala Asn Tyr Glu Ala 

165 170 175 

^0 Met His Leu Lys Glu Lys Tyr Glu Asn Ala Gly Gly Asp Lys He Cys 

180 185 190 

Asn Ala He Leu Gly Ser Tyr Ala Asp He Gly Asp He Val Arg Gly 
195 200 205 

Asp Val Trp Arg Asp He Asn Thr Asn Lys Leu Ser Glu Lys Phe 
45 210 215 220 

Gin Lys He Phe Met Gly Gly Gly Asn Ser Arg Lys Lys Gin Asn Asp 
225 230 235 240 

Asn Asn Glu Arg Asn Lys Trp Trp Glu Lys Gin Arg Asn Leu He Trp 

245 250 255 

Ser Ser Met Val Lys His He Pro Lys Gly Lys Thr Cys Lys Arg His 

260 265 270 

Asn Asn Phe Glu Lys He Pro Gin Phe Leu Arg Trp Leu Lys Glu Trp 
275 280 285 

Gly Asp Glu Phe Cys Glu Glu Met Gly Thr Glu Val Lys Gin Leu Glu 
o a n '1 n c 



50 



55 290 295 



60 



30U 



Lys He Cys Glu Asn Lys Asn Cys Ser Glu Lys Lys Cys Lys Asn Ala 
305 310 315 320 

Cys Ser Ser Tyr Glu Lys Trp He Lys Glu Arg Lys Asn Glu Tyr Asn 

325 330 335 

Leu Gin Ser Lys Lys Phe Asp Ser Asp Lys Lys Leu Asn Lys Lys Asn 

340 345 350 

Asn Leu Tyr Asn Lys Phe Glu Asp Ser Lys Ala Tyr Leu Arg Ser Glu 

355 360 365 

Ser Lys Gin Cys Ser Asn He Glu Phe Asn Asp Glu Thr Phe Thr Phe 
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370 



Pro Asn Lys Tyr Lys Glu Ala'cys Met Val Cys Glu^Asn Pro Ser Ser 
Ser Lys Ala Leu Lys^Pro He Lys Thr Asn Va! Phe Pro He Gl^> 



410 



ser Lys Lys Ser Glu Leu Ser Ser Leu ThVAsp Lys Ser Lys Asn Thr 



425 



Pro Asn Ser Ser Gly Gly Gly Asn TyV Gly Asp Arg Gin Ile'slr Lys 



435 



Arg ASP Asp Val His His Asp GlV"pro Lys Glu Val Lys^ler Gly Glu 



455 460 



Lys Glu val Pro Lys lie Asp Ala Ala Val Lys Thr Glu Asn Glu Phe 
Thr ser Asn Arg Asn^Asp He Glu Gly Lys gIu Lys Ser Lys Gly Isp 
His ser ser P^ro^ Val His Ser Lys Asp He^ Lys Asn Glu Glu Pro cln 
Arg val Val Ser Glu Asn Leu Pro Lys° He Glu Glu Lys Met G^lu Ser 
ser ASP Ser He Pro He Thr^nfs'^He Glu Ala Glu Lys^Gly Gin Ser 



ser Asn Ser Ser Asp Asn Asp'pro Ala Val Val Ser^Gly Arg Glu Ser 



550 



Lys Asp val Asn Leu H^^ 560 

bob — — - 



570 



555 560 

u Gl 

Gly val He Lys^ Thr Asp Asp Ser Ser Lys' Ser He Glu He Ser lys 

He Pro ser Asp Gin Asn Asn His^Se^r Asp Leu Ser Gin Asn A^la Asn 

Glu Asp Ser Asn Gin Gly Asn^Lys Glu Thr He Asn Pro^lro Ser Thr 

Glu Lys Asn Leu Lys Glu He His Tyr Lys Thr Ser^Asp Ser Asp Asp 

His Gly ser Lys Ile^Lys Ser Glu He Glu Iro Lys Glu Leu Thr gIS 

Glu Ser Pro Leu Thr Asp Lys Lys Thr Gltfser Ala Ala He Gly Isp 

Lys Asn His^ Glu Ser Val Lys Ser Ala Asp He Phe Gin Ser G^lu He 

His Asn Ser Asp Asn Arg Asp Arg He Val Ser Glu Ser^Val Val Gin 

Asp ser Ser Gly Ser Ser Met'ser Thr Glu Ser Ile^Arg Thr Asp Asn 

Lys Asp Phe Lys Thr^Ser Glu Asp He Ala Pro Ser He Asn Gly IJg 

Asn Ser Arg Val Asp Glu Leu Thr Ser Arg Arg Pro Leu 
740 745 

(2) INFOR^4ATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2606 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
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AGCTCTATTA CGACTCACTA TAGGGAAAGC TGGTACGCCT GCAGGTACCG GTCCGGAATT 6 0 
CCCGGGTCGA CGAGCTCACT AGTCGGCGGC CGCTCTAGAG GATCCAAGCT TAATAGTGTT 12 0 
TATACGTCTA TTGGCTTATT TTTAAATAGC TTAAAAAGCG GACCATGTAA AAAGGATAAT 180 
GATAATGCAG AGGATAATAT AGATTTTGGT GATGAAGGTA AAACATTTAA AGAGGCAGAT "24 0 
AATTGTAAAC CATGTTCTCA ATTTACTGTT GATTGTAAAA ATTGTAATGG TGGTGATACA 3 0 0 
AAAGGGAAGT GCAATGGCAG CAATGGCAAA AAGAATGGAA ATGATTATAT TACTGCAAGT 3 6 0 
GATATTGAAA ATGGAGGGAA TTCTATTGGA AATATAGATA TGGTTGTTAG TGATAAGGAT 42 0 
GCAAATGGAT TTAATGGTTT AGACGCTTGT GGAAGTGCAA ATATCTTTAA AGGTATTAGA 480 
AAAGAACAAT GGAAATGTGC TAAAGTATGT GGTTTAGATG TATGTGGTCT TAAAAATGGT 54 0 
AATGGTAGTA TAGATAAAGA TCAAAAACAA ATTATAATTA TTAGAGCATT GCTTAAACGT 600 
TGGGTAGAAT ATTTTTTAGA AGATTATAAT AAAATTAATG CCAAAATTTC ACATTGTACG 66 0 
AAAAAGGATA ATGAATCCAC ATGTACAAAT GATTGTCCAA ATAAATGTAC ATGTGTAGAA 72 0 
GAGTGGATAA ATCAGAAAAG GACAGAATGG AAAAATATAA AAAAACATTA CAAAACACAA 78 0 
AATGAAAATG GTGACAATAA CATGAAATCT TTGGTTACAG ATATTTTGGG TGCCTTGCAA 84 0 
1 5 CCCCAAAGTG ATGTTAACAA AGCTATAAAA CCTTGTAGTG GTTTAACTGC GTTCGAGAGT 90 0 
TTTTGTGGTC TTAATGGCGC TGATAACTCA GAAAAAAAAG AAGGTGAAGA TTACGATCTT 96 0 
GTTCTATGTA TGCTTAAAAA TCTTGAAAAA CAAATTCAGG AGTGCAAAAA GAAACATGGC 1020 
GAAACTAGTG TCGAAAATGG TGGCAAATCA TGTACCCCCC TTGACAACAC CACCCTTGAG 1080 
GAGGAACCCA TAGAAGAGGA AAACCAAGTG GAAGCGCCGA ACATTTGTCC AAAACAAACA 114 0 
20 GTGGAAGATA AAAAAAAAGA GGAAGAAGAA GAAACTTGTA CACCGGCATC ACCAGTACCA 1200 
GAAAAACCGG TACCTCATGT GGCACGTTGG CGAACATTTA CACCACCTGA GGTATTCAAG 1260 
ATATGGAGGG GAAGGAGAAA TAAAACTACG TGCGAAATAG TGGCAGAAAT GCTTAAAGAT 1320 
AAGAATGGAA GGACTACAGT AGGTGAATGT TATAGAAAAG AAACTTATTC TGAATGGACG 1380 
TGTGATGAAA GTAAGATTAA AATGGGACAG CATGGAGCAT GTATTCCTCC AAGAAGACAA 144 0 
25 AAATTATGTT TACATTATTT AGAAAAAATA ATGACAAATA CAAATGAATT GAAATACGCA 1500 
TTTATTAAAT GTGCTGCAGC AGAAACTTTT TTGTTATGGC AAAACTACAA AAAAGATAAG 1560 
AATGGTAATG CAGAAGATCT CGATGAAAAA TTAAAAGGTG GTATTATCCC CGAAGATTTT 1620 
AAACGGCAAA TGTTCTATAC GTTTGCAGAT TATAGAGATA TATGTTTGGG TACGGATATA 168 0 
TCATCAAAAA AAGATACAAG TAAAGGTGTA GGTAAAGTAA AATGCAATAT TGATGATGTT 174 0 
30 TTTTATAAAA TTAGCAATAG TATTCGTTAC CGTAAAAGTT GGTGGGAAAC AAATGGTCCA 1800 
GTTATATGGG AAGGAATGTT ATGCGCTTTA AGTTATGATA CGAGCCTAAA TAATGTTAAT 186 0 
CCGGAAACTC ACAAAAAACT TACCGAAGGC AATAACAACT TTGAGAAAGT CATATTTGGT 192 0 
AGTGATAGTA GCACTACTTT GTCCAAATTT TCTGAAAGAC CTCAATTTCT AAGATGGTTG 1980 
ACTGAATGGG GAGAAAATTT CTGCAAAGAA CAAAAAAAGG AGTATAAGGT GTTGTTGGCA 2 04 0 
35 AAATGTAAGG ATTGTGATGT TGATGGTGAT GGTAAATGTA ATGGAAAATG TGTTGCGTGC 210 0 
AAAGATCAAT GTAAACAATA TCATAGTTGG ATTGGAATAT GGATAGATAA TTATAAAAAA 216 0 
CAAAAAGGAA GATATACTGA GGTTAAAAAA ATACCTCTGT ATAAAGAAGA TAAAGACGTG 2 22 0 
AAAAACTCAG ATGATGCTCG CGATTATTTA AAAACACAAT TACAAAATAT GAAATGTGTA 228 0 
AATGGAACTA CTGATGAAAA TTGTGAGTAT AAGTGTATGC ATAAAACCTC ATCCACAAAT 2 34 0 
40 AGTGATATGC CCGAATCGTT GGACGAAAAG CCGGAAAAGG TCAAAGACAA GTGTAATTGT 24 0 0 
GTACCTAATG AATGCAATGC ATTGAGTGTA AGTGGTAGCG GTTTTCCTGA TGGTCAAGCT 24 6 0 
TACGTACGCG TGCATGCGAC GTCATAGCTC TTCTATAGTG TCACCTAAAT TCAATTCACT 2 52 0 
GGCCGTCGTT TTACAACGTC GTGACTGGGA AAACCTGGCG TTACCCAACT TAATCGCCTT 2580 
GCAGCACATC CCCCTTTCGC CAGCTG 2606 



45 



55 



60 



(2) INFORMATION FOR SEQ ID NO : 8 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 921 amino acids 
50 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: . NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Lys Leu Asn Ser Val Tyr Thr Ser lie Gly Leu Phe Leu Asn Ser Leu 
15 10 15 
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25 



Lys Ser Gly Pro Cys Lys Lys Asp Asn Asp Asn Ala Glu Asp Asn He 

20 25 30 

Asp Phe Gly Asp Glu Gly Lys Thr Phe Lys Glu Ala Asp Asn Cys Lvs 
_ 35 40 45 

5 Pro Cys Ser Gin Phe Thr Val Asp Cys Lys Asn Cys Asn Gly Gly Asp 

^ ^° 55 60 

Thr Lys Gly Lys Cys Asn Gly Ser Asn Gly Lys Lys Asn Gly Asn Asp 
^ 75 80 

Tyr He Thr Ala Ser Asp He Glu Asn Gly Gly Asn Ser He Gly Asn 

^5 90 95 

He Asp Met Val Val Ser Asp Lys Asp Ala Asn Gly Phe Asn Gly Leu 

100 105 110 

Asp Ala Cys Gly Ser Ala Asn He Phe Lys Gly He Arg Lys Glu Gin 

115 120 125 

Trp Lys Cys Ala Lys Val Cys Gly Leu Asp Val Cys Gly Leu Lys Asn 

130 135 140 

Gly Asn Gly Ser He Asp Lys Asp Gin Lys Gin He He He He Ara 
145 150 155 

-f, Leu Lys Arg Trp Val Glu Tyr Phe Leu Glu Asp Tyr Asn Lys 

165 170 175 

He Asn Ala Lys He Ser His Cys Thr Lys Lys Asp Asn Glu Ser Thr 

180 185 190 

Cys Thr Asn Asp Cys Pro Asn Lys Cys Thr Cys Val Glu Glu Trp He 

195 200 205 

Asn Gin Lys Arg Thr Glu Trp Lys Asn He Lys Lys His Tyr Lys Thr 

210 215 220 

Gin Asn Glu Asn Gly Asp Asn Asn Met Lys Ser Leu Val Thr Asp He 

230 235 240 

,n Ala Leu Gin Pro Gin Ser Asp Val Asn Lys Ala He Lys Pro 

„ 245 250 255 

Cys Ser Gly Leu Thr Ala Phe Glu Ser Phe Cys Gly Leu Asn Gly Ala 

260 265 270 

Asp Asn Ser Glu Lys Lys Glu Gly Glu Asp Tyr Asp Leu Val Leu Cys 

275 280 285 

Met Leu Lys Asn Leu Glu Lys Gin He Gin Glu Cys Lys Lys Lys His 

290 295 300 

Gly Glu Thr Ser Val Glu Asn Gly Gly Lys Ser Cys Thr Pro Leu Asp 
305 310 315 320 

Asn Thr Thr Leu Glu Glu Glu Pro He Glu Glu Glu Asn Gin Val Glu 

325 330 335 

Ala Pro Asn He Cys Pro Lys Gin Thr Val Glu Asp Lys Lys Lys Glu 

340 345 350 

Glu Glu Glu Glu Thr Cys Thr Pro Ala Ser Pro Val Pro Glu Lys Pro 

355 360 365 

Val Pro His Val Ala Arg Trp Arg Thr Phe Thr Pro Pro Glu Val Phe 

370 375 380 

Lys He Trp Arg Gly Arg Arg Asn Lys Thr Thr Cys Glu He Val Ala 
385 390 395 400 

„ Leu Lys Asp Lys Asn Gly Arg Thr Thr Val Gly Glu Cys Tyr 

50 405 410- ^415^ 

Arg Lys Glu Thr Tyr Ser Glu Trp Thr Cys Asp Glu Ser Lys He Lys 

420 425 430 

Met Gly Gin His Gly Ala Cys He Pro Pro Arg Arg Gin Lys Leu Cys 

435 440 445 

Leu His Tyr Leu Glu Lys He Met Thr Asn Thr Asn Glu Leu Lys Tvr 

450 455 460 

Ala Phe He Lys Cys Ala Ala Ala Glu Thr Phe Leu Leu Trp Gin Asn 
465 470 475 480 

Tyr Lys Lys Asp Lys Asn Gly Asn Ala Glu Asp Leu Asp Glu Lys Leu 
60 485 490 495 

Lys Gly Gly He He Pro Glu Asp Phe Lys Arg Gin Met Phe Tyr Thr 

500 505 510 

Phe Ala Asp Tyr Arg Asp He Cys Leu Gly Thr Asp He Ser Ser Lys 
515 520 525 



35 
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Lys Asp Thr Ser Lys Gly Val Gly Lys Val Lys Cys Asn He Asp Asp 
530 535 54Q ^ ^ 

Val Phe Tyr Lys He Ser Asn Ser He Arg Tyr Arg Lys Ser Trp Trp 

550 555 5gQ 

5 Glu Thr Asn Gly Pro Val He Trp Glu Gly Met Leu Cys Ala Leu Ser 

565 570 575 

Tyr Asp Thr Ser Leu Asn Asn Val Asn Pro Glu Thr His Lys Lys Leu 

580 585 590 

Thr Glu Gly Asn Asn Asn Phe Glu Lys Val He Phe Gly Ser Asp Ser 
- 595 600 605 

Ser Thr Thr Leu Ser Lys Phe Ser Glu Arg Pro Gin Phe Leu Ara Tro 

610 615 620 

Leu Thr Glu Trp Gly Glu Asn Phe Cys Lys Glu Gin Lys Lys Glu Tyr 
625 630 635 640 

Lys Val Leu Leu Ala Lys Cys Lys Asp Cys Asp Val Asp Gly Asp Gly 

645 650 655 

Lys Cys Asn Gly Lys Cys Val Ala Cys Lys Asp Gin Cys Lys Gin Tyr 
660 665 670 

,n Ser Trp He Gly He Trp He Asp Asn Tyr Lys Lys Gin Lys Gly 

■^^ 675 680 685 

Arg Tyr Thr Glu Val Lys Lys He Pro Leu Tyr Lys Glu Asp Lys Asp 

690 695 700 

Val Lys Asn Ser Asp Asp Ala Arg Asp Tyr Leu Lys Thr Gin Leu Gin 
■^°5 710 715 720 

Asn Met Lys Cys Val Asn Gly Thr Thr Asp Glu Asn Cys Glu Tyr Lys 

725 730 735' 

Cys Met. Hxs Lys Thr Ser Ser Thr Asn Ser Asp Met Pro Glu Ser Leu 

740 745 75Q 

Asp Glu Lys Pro Glu Lys Val Lys Asp Lys Cys Asn Cys Val Pro Asn 

755 760 765 

Glu Cys Asn Ala Leu Ser Val Ser Gly Ser Gly Phe Pro Asp Gly Gin 

770 775 780 

Ala Phe Gly Gly Gly Val Leu Glu Gly Thr Cys Lys Gly Leu Gly Glu 
785 790 795 800 

Pro Lys Lys Lys He Glu Pro Pro Gin Tyr Asp Pro Thr Asn Asp He 

805 - 810 815 

Leu Lys Ser Thr He Pro Val Thr He Val Leu Ala Leu Gly Ser He 

820 825 830 

Ala Phe Leu Phe Met Lys Val He Tyr He Tyr Val Trp Tyr He Tyr 
^0 835 840 845 

Met Leu Cys Val Gly Ala Leu Asp Thr Tyr He Cys Gly Cys He Cys 

850 855 860 

He Cys He Phe He Cys Val Ser Val Tyr Val Cys Val Tyr Val Tyr 

870 875 880 

Val Phe Leu Tyr Met Cys Val Phe Tyr He Tyr Phe He Tyr He Tyr 

885 890 895 

Val Phe He Leu Lys Met Lys Lys Met Lys Lys Met Lys Lys Met Lys 

900 905 910 

Lys Met Lys Lys Arg Lys Lys Arg He 
50 915 920 
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(2) INFORMATION FOR SEQ ID NO : 9 



SEQUENCE' CHARACT*^"° T T r'c 



55 (A) LENGTH: 2101 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) 



ORIGINAL SOURCE: 



10 
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(A) ORGANISM: Plasmodium falciparum 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

GGAACAGGGT GATAATAAAG TAGGAGCCTG TGCTCCGTAT AGACGATTAC ATTTATGTGA 6 0 
TTATAATTTG GAATCTATAG ACACAACGTC GACGACGCAT AAGTTGTTGT TAGAGGTGTG 12 0 
TATGGCAGCA AAATACGAAG GAAACTCAAT AAATACACAT TATACACAAC ATCAACGAAC 18 0 
TAATGAGGAT TCTGCTTCCC AATTATGTAC TGTATTAGCA CGAAGTTTTG CAGATATAGG 24 0 
TGATATCGTA AGAGGAAAAG ATCTATATCT CGGTTATGAT AATAAAGAAA AAGAACAAAG 3 00 
AAAAAAATTA GAACAGAAAT TGAAAGATAT TTTCAAGAAA ATACATAAGG ACGTGATGAA 3 60 
GACGAATGGC GCACAAGAAC GCTACATAGA TGATGCCAAA GGAGGAGATT TTTTTCAATT 42 0 
AAGAGAAGAT TGGTGGACGT CGAATCGAGA AACAGTATGG AAAGCATTAA TATGTCATGC 480 
ACCAAAAGAA GCTAATTATT TTATAAAAAC AGCGTGTAAT GTAGGAAAAG GAACTAATGG 54 0 
TCAATGCCAT TGCATTGGTG GAGATGTTCC CACATATTTC GATTATGTGC CGCAGTATCT 600 
1 5 TCGCTGGTTC GAGGAATGGG CAGAAGACTT TTGCAGGAAA AAAAAAAAAA AACTAGAAAA 66 0 
TTTGCAAAAA CAGTGTCGTG ATTACGAACA AAATTTATAT TGTAGTGGTA ATGGCTACGA 72 0 
TTGCACAAAA ACTATATATA AAAAAGGTAA ACTTGTTATA GGTGAACATT GTACAAACTG 78 0 
TTCTGTTTGG TGTCGTATGT ATGAAACTTG GATAGATAAC CAGAAAAAAG AATTTCTAAA 84 0 
ACAAAAAAGA AAATACGAAA CAGAAATATC AGGTGGTGGT AGTGGTAAGA GTCCTAAAAG 900 
GACAAAACGG GCTGCACGTA GTAGTAGTAG TAGTGATGAT AATGGGTATG AAAGTAAATT 96 0 
TTATAAAAAA CTGAAAGAAG TTGGCTACCA AGATGTCGAT AAATTTTTAA AAATATTAAA 1020 
CAAAGAAGGA ATATGTCAAA AACAACCTCA AGTAGGAAAT GAAAAAGCAG ATAATGTTGA 108 0 
TTTTACTAAT GAAAAATATG TAAAAACATT TTCTCGTACA GAAATTTGTG AACCGTGCCC 1140 
ATGGTGTGGA TTGGAAAAAG GTGGTCCACC ATGGAAAGTT AAAGGTGACA AAACCTGCGG 1200 
AAGTGCAAAA ACAAAGACAT ACGATCCTAA AAATATTACC GATATACCAG TACTCTACCC 126 0 
TGATAAATCA CAGCAAAATA TACTAAAAAA ATATAAAAAT TTTTGTGAAA AAGGTGCACC 132 0 
TGGTGGTGGT CAAATTAAAA AATGGCAATG TTATTATGAT GAACATAGGC CTAGTAGTAA 1380 
AAATAATAAT AATTGTGTAG AAGGAACATG GGACAAGTTT ACACAAGGTA AACAAACCGT 144 0 
TAAGTCCTAT AATGTTTTTT TTTGGGATTG GGTTCATGAT ATGTTACACG ATTCTGTAGA 1500 
GTGGAAGACA GAACTTAGTA AGTGTATAAA TAATAACACT AATGGCAACA CATGTAGAAA 156 0 
CAATAATAAA TGTAAAACAG ATTGTGGTTG TTTTCAAAAA TGGGTTGAAA AAAAACAACA 1620 
AGAATGGATG GCAATAAAAG ACCATTTTGG AAAGCAAACA GATATTGTCC AACAAAAAGG 1680 
TCTTATCGTA TTTAGTCCCT ATGGAGTTCT TGACCTTGTT TTGAAGGGCG GTAATCTGTT 174 0 
GCAAAATATT AAAGATGTTC ATGGAGATAC AGATGACATA AAACACATTA AGAAACTGTT 1800 
GGATGAGGAA GACGCAGTAG CAGTTGTTCT TGGTGGCAAG GACAATACCA CAATTGATAA 186 0 
ATTACTACAA CACGAAAAAG AACAAGCAGA ACAATGCAAA CAAAAGCAGG AAGAATGCGA 192 0 
GAAAAAAGCA CAACAAGAAA GTCGTGGTCG CTCCGCCGAA ACCCGCGAAG ACGAAAGGAC 1980 
ACAACAACCT GCTGATAGTG CCGGCGAAGT CGAAGAAGAA GAAGACGACG ACGACTACGA 2 04 0 
CGAAGACGAC GAAGATGACG ACGTAGTCCA GGACGTAGAT GTAAGTGAAA TAAGAGGTCC 2100 



20 



25 



30 



2101 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 700 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

50 (ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Glu Gin Gly Asp Asn Lys Val Gly Ala Cys Ala Pro Tyr Arg Arg Leu 
1. 5 10 15 

His Leu Cys Asp Tyr Asn Leu Glu Ser lie Asp Thr Thr Ser Thr Thr 

20 25 .30 

His Lys Leu Leu Leu Glu Val Cys Met Ala Ala Lys Tyr Glu Gly Asn 
35 40 45 
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Ser He Asn Thr His Tyr Thr Gin His Gin Arg Thr Asn Glu Asp Ser 

50 55 60 

Ala Ser Gin Leu Cys Thr Val Leu Ala Arg Ser Phe Ala Asp He Glv 
65 70 75 • - 80 

b Asp He Val Arg Gly Lys Asp Leu Tyr Leu Gly Tyr Asp Asn Lys Glu 

85 90 95 

Lys Glu Gin Arg Lys Lys Leu Glu Gin Lys Leu Lys Asp He Phe Lys 

100 105 110 

.Lys He His Lys Asp Val Met Lys Thr Asn Gly Ala Gin Glu Ara Tvr 
10 115 120 125 

He Asp Asp Ala Lys Gly Gly Asp Phe Phe Gin Leu Arg Glu Asp Trp 

130 135 140 

Trp Thr Ser Asn Arg Glu Thr Val Trp Lys Ala Leu He Cys His Ala 

150 155 160 

1^ Lys Glu Ala Asn Tyr Phe He Lys Thr Ala Cys Asn Val Gly Lys 

165 170 175 

Gly Thr Asn Gly Gin Cys His Cys He Gly Gly Asp Val Pro Thr Tyr 
180 185 190 

Asp Tyr Val Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu 
20 195 200 205 

Asp Phe Cys Arg Lys Lys Lys Lys Lys Leu Glu Asn Leu Gin Lys Gin 

210 215 220 

Cys Arg Asp Tyr Glu Gin Asn Leu Tyr Cys Ser Gly Asn Gly Tvr Asp 
225 230 235 240 

2^ Cys Thr Lys Thr He Tyr Lys Lys Gly Lys Leu Val He Gly Glu His 

245 250 255 

Cys Thr Asn Cys Ser Val Trp Cys Arg Met Tyr Glu Thr Trp He Asp 

260 265 270 

-^sn Gin Lys Lys Glu Phe Leu Lys Gin Lys Arg Lys Tyr Glu Thr Glu 
30 275 280 285 

He Ser Gly Gly Gly Ser Gly Lys Ser Pro Lys Arg Thr Lys Ara Ala 

290 295 300 

Ala Arg Ser Ser Ser Ser Ser Asp Asp Asn Gly Tyr Glu Ser Lys Phe 

310 315 320 

Tyr Lys Lys Leu Lys Glu Val Gly Tyr Gin Asp Val Asp Lys Phe Leu 

325 - 33 0 33 5 - 

Lys He Leu Asn Lys Glu Gly He Cys Gin Lys Gin Pro Gin Val Gly 

340 345 350 

Asn Glu Lys Ala Asp Asn Val Asp Phe Thr Asn Glu Lys Tyr Val Lvs 
40 355 360 365 

Thr Phe Ser Arg Thr Glu He Cys Glu Pro Cys Pro Trp Cys Gly Leu 

370 375 380 

Glu Lys Gly Gly Pro Pro Trp Lys Val Lys Gly Asp Lys Thr Cys Gly 
385 390 395 400 

4b Ser Ala Lys Thr Lys Thr Tyr Asp Pro Lys Asn He Thr Asp He Pro 

405 410 415 

Val Leu Tyr Pro Asp Lys Ser Gin Gin Asn He Leu Lys Lys Tyr Lys 

420 425 430 

Asn Phe Cys Glu Lys Gly Ala Pro Gly Gly Gly Gin He Lys Lys Trp 
50 435 440 • 445 

Gin Cys Tyr Tyr Asp Glu His Arg Pro Ser Ser Lys Asn Asn Asn Asn 

450 455 460 

Cys Val Glu Gly Thr Trp Asp Lys Phe Thr Gin Gly Lys Gin Thr Val 

470 475 >ion 

55 Lys Ser Tyr Asn Val Phe Phe Trp Asp Trp Val His Asp Met Leu His 

485 490 495 

Asp Ser Val Glu Trp Lys Thr Glu Leu Ser Lys Cys He Asn Asn Asn 

500 505 510 

Thr Asn Gly Asn Thr Cys Arg Asn Asn Asn Lys Cys Lys Thr Asp Cvs 
60 515 520 525 

Gly Cys Phe Gin Lys Trp Val Glu Lys Lys Gin Gin Glu Trp Met Ala 

530 535 540 

He Lys Asp His Phe Gly Lys Gin Thr Asp He Val Gin Gin Lys Gly 
545 550 555 560 
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Leu He Val Phe Ser Pro Tyr Gly Val Leu Asp Leu Val Leu Lys Gly 

565 570 575 

Gly Asn Leu Leu Gin Asn He Lys Asp Val His Gly Asp Thr Asp Asp 
580 585 590 . - 

t) He Lys His He Lys Lys Leu Leu Asp Glu Glu Asp Ala Val Ala Val 

595 600 605 

Val Leu Gly Gly Lys Asp Asn Thr Thr He Asp Lys Leu Leu Gin His 
610 615 620 

in ^^"^ ^•^'^ ^^"^ ^^"^ Gin GIu GIu Cys Glu 

f^^ 630 635 

Lys Lys Ala Gin Gin Glu Ser Arg Gly Arg Ser Ala Glu Thr Arg Glu 

645 650 655 

Asp Glu Arg Thr Gin Gin Pro Ala Asp Ser Ala Gly Glu Val Glu Glu 

665 670 
Asp Asp Asp Asp Tyr Asp Glu Asp Asp Glu Asp Asp Asp Val 
675 680 685 

Val Gin Asp Val Asp Val Ser Glu He Arg Gly Pro 
690 695 700 

20 (2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS:- 

(A) LENGTH: 8220 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
30 (iii) HYPOTHETICAL: NO . . 



35 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11; 

AAAAATGGGG CCCAAGGAGG CTGCAGGTGG GGATGATATT GAGGATGAAA GTGCCAAACA 6 0 
TATGTTTGAT AGGATAGGAA AAGATGTGTA CGATAAAGTA AAAGAGGAAG CTAAAGAACG 12 0 
TGGTAAAGGC TTGCAAGGAC GTTTGTCAGA AGCAAAATTT GAGAAAAATG AAAGCGATCC 18 0 
ACAAACACCA GAAGATCCAT GCGATCTTGA TCATAAATAT CATACAAATG TAACTACTAA 24 0 
TGTAATTAAT CCGTGCGCTG ATAGATCTGA CGTGCGTTTT TCCGATGAAT ATGGAGGTCA 3 00 
ATGTACACAT AATAGAATAA AAGATAGTCA ACAGGGTGAT AATAAAGGTG CATGTGCTCC 3 6 0 
ATATAGGCGA TTGCATGTAT GCGATCAAAA TTTAGAACAG ATAGAGCCTA TAAAAATAAC 42 0 
AAATACTCAT AATTTATTGG TAGATGTGTG TATGGCAGCA AAATTTGAAG GACAATCAAT 4 80 
AACACAAGAT TATCCAAAAT ATCAAGCAAC ATATGGTGAT TCTCCTTCTC AAATATGTAC 54 0 
TATGCTGGCA CGAAGTTTTG CGGACATAGG GGACATTGTC AGAGGAAGAG ATTTGTATTT 6 00 
AGGTAATCCA CAAGAAATAA AACAAAGACA ACAATTAGAA AATAATTTGA AAACAATTTT 6 6 0 
CGGGAAAATA TATGAAAAAT TGAATGGCGC AGAAGCACGC TACGGAAATG ATCCGGAATT 72 0 
TTTTAAATTA CGAGAAGATT GGTGGACTGC TAATCGAGAA ACAGTATGGA AAGCCATCAC 780 
ATGTAACGCT TGGGGTAATA CATATTTTCA TGCAACGTGC AATAGAGGAG AACGAACTAA 84 0 
AGGTTACTGC CGGTGTAACG ACGACCAAGT TCCCACATAT TTTGATTATG TGCCGCAGTA 900 
TCTTCGCTGG TTCGAGGAAT GGGCAGAAGA TTTTTGTAGG AAAAAAAATA AAAAAATAAA 9 6 0 
AGATGTTAAA AGAAATTGTC GTGGAAAAGA TAAAGAGGAT AAGGATCGAT ATTGTAGCCG 102 0 
TAATGGCTAC GATTGCGAAA AAACTAAACG AGCGATTGGT AAGTTGCGTT ATGGTAAGCA 1080 
ATGCATTAGC TGTTTGTATG CATGTAATCC TTACGTTGAT TGGATAAATA ACCAAAAAGA 114 0 
ACAATTTGAC AAACAGAAAA AAAAATATGA TGAAGAAATA AAAAAATATG AAAATGGAGC 1200 
ATCAGGTGGT AGTAGGCAAA AACGGGATGC AGGTGGTACA ACTACTACTA ATTATGATGG 12 60 
ATATGAAAAA AAATTTTATG ACGAACTTAA TAAAAGTGAA TATAGAACCG TTGATAAATT 132 0 
TTTGGAAAAA TTAAGTAATG AAGAAATATG CACAAAAGTT AAAGAGGAAG AAGGAGGAAC 13 80 
AATTGATTTT AAAAACGTTA ATAGTGATAG TACTAGTGGT GCTAGTGGCA CTAATGTTGA 144 0 
AAGTCAAGGA ACATTTTATC GTTCAAAATA TTGCCAACCC TGCCCTTATT GTGGAGTGAA 1500 
AAAGGTAAAT AATGGTGGTA GTAGTAATGA ATGGGAAGAG AAAAATAATG GCAAGTGCAA 156 0 
GAGTGGAAAA CTTTATGAGC CTAAACCCGA CAAAGAAGGT ACTACTATTA CAATCCTTAA 1620 
AAGTGGTAAA GGACATGATG ATATTGAAGA AAAATTAAAC AAATTTTGTG ATGAAAAAAA 1680 
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TGGTGATACA ATAAATAGTG GTGGTAGTGG 
TAGACAGGAA TTGTATGAAG AATGGAAATG 
ACACGATGAG GATGACGAGG AGGATTATGA 
ATTAAAAAAC CT^AAAAAAGA ATAAAGAAGA 
5 TGAAATCCAA AAGACATTCA ATCCTTTTTT 
TTCCATACAT TGGAAAAAAA AACTTCAGAG 
TGGAAACAAT AAATGTAATA ATGATTGTGA 
AGACGAATGG GGGAAAATAG TACAACATTT 
TAGTGACAAT ACGGCAGAAT TAATCCCATT 

10 GCAAGAAGAA TTTTTGAAAG GCGATTCCGA 
TCTGGATGCA GAGGAGGCAG AGGAACTAAA 
CAATAATCAA GAAGCATCTG TTGGTGGTGG 
ATTGCTCAAC TACGAAAAAG ACGAAGCCGA 
AGAGGAAAAA GAAAAAGGAG ACGGAAACGA 

15 TAATCCATGT AGTGGCGAAA GTGGTAACAA 
GTATCAAATG CATCACAAGG CAAAGACACA 
GAGAGGTGAT ATATCCTTAG CGCAATTTAA 
ACAAATTTGC AAAATTAACG AAAACTATTC 
ATGTACAGGC AAAGATGGAG ATCACGGAGG 

20 AAATATTGAA GGAAAAAAAC AAACGTCATA 
ACACATGTGT ACATCCAATT TAGAAAATTT 
GGCTAGCCAC TCATTATTGG GAGATGTTCA 
AATAAAACGC TATAAAGATC AAAATAATAT 
CCAGGAGGCT ATGTGTCGAG CTGTACGTTA 

25 AGGAAGAGAT ATGTGGGATG AGGATAAGAG 
CGTATTTAAA AACATTAAAG AAAAACATGA 
TGATGAAAGC AAAAAGCCCG CATATAAAAA 
ACATCAAGTG TGGAGAGCCA TGAAATGCGC 
AGTTGACGAT TATATCCCCC AACGTTTACG 

30 TAAAGCGCAA TCACAGGAGT ATGACAAGTT 
GGGTGATGGA AAATGTACGC AAGGTGATGT 
TAAATATAAA GAGGAAATAG AAAAATGGAA 
CAATCTATTA TACCTACAAG CAAAAACTAC 
TGATGACGAT CCCGACTATC AACAAATGGT 

35 TATTGCCGCA CGTGTTCTTG TTAAACGTGC 
CGCCCCGATC ACCCCCTACA GTACTGCTGC 
GGGGTGCCAG GAACAAACAC AATTTTGTGA 
CACGAAAGAA AACAAAGAAT ACACCTTTAA 
TGATTGCATA AATAGGTCGC AAACAGAGGA 

40 TGCCTGCAAA ATAGTGGAGA AAATACTTGA 
ATGTAATCCA AAAGAGAGTT ATCCTGATTG 
TGATGGTGCT TGTATGCCTC CAAGGAGACA 
GAGTCAAACA GAAAATATAA AAACAGAGGA 
AGCAGCAGAA ACTTTTCTTT CATGGCAATA 

45 AATATTAGAT AGAGGCCTTA TTCCATCCCA 
AGATTATAGA GATATATGTT TGAACACAGA 
GGCAAAAGAT AAAATAGGTA AATTTTTCTC 
ATCACGCCAA GAATGGTGGA AAACAAATGG 
CTTAACAAAA TACGTCACAG ATACCGATAA 

50 CGATAAAGTC AACCAATCCC AAAATGGCAA 
TCAATTTCTA CGTTGGATGA TCGAATGGGG 
GGAAAATATC ATAAAAGATG CATGTAATGA 
GAAACATCGT TGTAATCAAG CATGTAGAGC 
AGAATTTTCG GGACAAACAA ATAACTTTGT 

55 AGAATATAAA GGATATGAAT ATAAAGACGG 
ACTGCAAAAA TGTGATAATA ATAAATGTTC 
TCCAAAAGAA AAACCTTTTG GAAAATATGC 
TCAAGGAAAA CATGTACCTA GCATACCACC 
AGCACCAACA GTAACAGTAG ACGTTTGCAG 

60 CAATTTTTCC GACGCTTGTG GTCTAAAATA 
TATACCAAGT GACACAAAAA GTGGTGCTGG 
TGGTAGTATT TGTATCCCAC CCAGGAGGCG 
GGCTACCGCG CTCCCACAAG GTGAGGGCGC 
GCGCAATGCG TTCATCCAAT CTGCTGCAAT 



TACGGGTGGT AGTGGTGGTG GTAACAGTGG 174 0 
TTATAAAGGT GAAGATGTAG TGAAAGTTGG 18 00 
AAATGTAAAA AATGCAGGCG GATTATGTAT 1860 
AGGTGGAAAT ACGTCTGAAA AGGAGCCTGA. 192 0 
TTACTATTGG GTTGCACATA TGTTAAAAGA 198 0 
ATGTTTACAA AATGGTAACA GAATAAAATG 2 04 0 
ATGTTTTAAA AGATGGATTA CACAAAAAAA 2100 
TAAAACGCAA AATATTAAAG GTAGAGGAGG 216 0 
TGATGACGAT TATGTTCTTC AATACAATTT 2220 
AGACGCTTCC. GAAGAAAAAT CCGAAAATAG 22 8 0 
ACACCTTCGC GAAATCATTG AAAGTGAAGA 234 0 
CGTCACTGAA CAAAAAAATA TAATGGATAA 24 00 
TTTATGCCTA GAAATTCACG AAGATGAGGA 24 6 0 
ATGTATCGAA GAGGGCGAAA ATTTTCGTTA 252 0 
ACGATACCCC GTTCTTGCGA ACAAAGTAGC 258 0 
ATTGGCTAGT CGTGCTGGTA GAAGTGCGTT 264 0 
AAATGGTCGT AACGGAAGTA CATTGAAAGG 2 700 
CAATGATAGT CGTGGTAATA GTGGTGGACC 276 0 
TGTGCGCATG AGAATAGGAA CGGAATGGTC 282 0 
CAAAAACGTC TTTTTACCTC CCCGACGAGA 28 8 0 
AGATGTTGGT AGTGTCACTA AAAATGATAA 2 94 0 
GCTCGCAGCA AAAACTGATG CAGCTGAGAT 3 0 00 
ACAACTAACT GATCCAATAC AACAAAAAGA 3 06 0 
TAGTTTTGCC GATTTAGGAG ACATTATTCG 312 0 
CTCAACAGAC ATGGAAACAC GTTTGATAAC 318 0 
TGGAATCAAA GACAACCCTA AATATACCGG 324 0 
ATTACGAGCA GATTGGTGGG AAGCAAATAG 33 00 
AACAAAAGGC ATCATATGTC CTGGTATGCC 33 6 0 
CTGGATGACT GAATGGGCTG AATGGTATTG 342 0 
AAAAAAAATC TGTGCAGATT GTATGAGTAA 3480 
CGATTGTGGA AAGTGCAAAG CAGCATGTGA 3 54 0 
TGAACAATGG AGAAAAATAT CAGATAAATA 3 600 
TTCTACTAAT CCTGGCCGTA CTGTTCTTGG 3 660 
AGATTTTTTG ACCCCAATAC ACAAAGCAAG 3 72 0 
TGCTGGTAGT CCCACTGAGA TCGCCGCCGC 3 78 0 
CGGATATATA CACCAGGAAA TAGGATATGG 3 84 0 
AAAAAAACAT GGTGCAACAT CAACTAGTAC 3 90 0 
ACAACCTCCG CCGGAGTATG CTACAGCGTG 3 96 0 
GCCGAAGAAA AAGGAAGAAA ATGTAGAGAG 4 02 0 
GGGTAAGAAT GGAAGGACTA CAGTAGGTGA 4 08 0 
GGATTGCAAA AACAATATTG ACATTAGTCA 414 0 
AAAACTATGT TTATATTATA TAGCACATGA 42 00 
TAATTTGAAA GATGCTTTTA TTAAAACTGC 42 6 0 
TTATAAGAGT AAGAATGATA GTGAAGCTAA 43 2 0 
ATTTTTAAGA TCCATGATGT ACACGTTTGG 43 8 0 
TATATCTAAA AAACAAAATG ATGTAGCTAA 444 0 
AAAAGATGGC AGCAAATCTC CTAGTGGCTT 4 50 0 
TCCAGAGATT TGGAAAGGAA TGTTATGTGC 4 56 0 
CAAAAGAAAA ATCAAAAACG ACTACTCATA 4 62 0 
CCCTTCCCTT GAAGAGTTTG CTGCTAAACC 46 80 
AGAAGAGTTT TGTGCTGAAC GTCAGAAGAA 4 74 0 
AATAAATTCT ACACAACAGT GTAATGATGC 4800 
ATATCAAGAA TATGTTGAAA ATAAAAAAAA 4860 
TCTAAAGGCA AATGTTCAGC CCCAAGATCC 4 92 0 
CGTACAACCG AT aCaGGGGA ATGAGTATTT 4 98 0 
TTGCATGGAT GGAAATGTAC TTTCCGTCTC 504 0 
CCATAAATAT CCTGAGAAAT GTGATTGTTA 5100 
TCCCCCCCCA CCTGTACAAC CACAACCGGA 516 0 
CATAGTAAAA ACACTATTTA AAGACACAAA 522 0 
CGGCAAAACC GCACCATCCA GTTGGAAATG 52 8 0 
TGCCACCACC GGCAAAAGTG GTAGTGATAG 534 0 
ACGATTATAT GTGGGGAAAC TACAGGAGTG 54 00 
CGCGCCGTCC CACTCACGCG CCGACGACTT 546 0 
AGAGACTTTT TTCTTATGGG ATAGATATAA 552 0 
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AGAAGAGAAA AAACCACAGG GTGATGGGTC ACAACAAGCA CTATCACAAC TAACCAGTAC 5580 
ATACAGTGAT GACGAGGAGG ACCCCCCCGA CAAACTGTTA CAAAATGGTA AGATACCCC? lllo 
CGATTTTTTG AGATTAATGT TCTATACATT AGGAGATTAT AGGGATATTT TAgSS?^ llOO 
TGGTAACACA AGTGACAGTG GTAACACAAA TGGTAGTAAC AACAACAATA TTCTgS?^ llSO 
AGCGAGTGGT AACAAGGAGG ACATGCAAAA AATACAAGAG AAAATAGAAC AAATTOTCCC 582^ 
AAAAAATGGT GGCACACCTC TTGTCCCAAA ATCTAGTGCC CAAACACCTG ATAAAtS?S Ilia 
GAATGAACAC GCCGAATCTA TCTGGAAAGG TATGATATGT GCATTCaS? SJJJ^gSS Islo 
GAACCCTGAC ACCAGTGCAA GAGGCGACGA AAACAAAATA GAAAAGGATG ATgSgTG?? 6oSS 
CGAGAAATTT TTTGGCAGCA CAGCCGACAA ACATGGCACA GCCTCAACCC SaACCGgSc 6060 
^TACAAAACC CAATACGACT ACGAAAAAGT CAAACTTGAG GATACAAGTG GTGcSSJ? 
CCCCTCAGCC TCTAGTGATA CACCCCTTCT CTCCGATTTC GTGTTACGCC CCC^CTA^ 6^8^ 
CCGTTACCTT GAAGAATGGG GTCAAAATTT TTGTAAAAAA AGAAAGCATA AATTGGcSa llTo 
^l^:^^^"^ GAGTGTAAAG TAGAAGAAAA TGGTGGTGGT AGTCGTCGTG GTGGTATAAC 6300 
AAGACAATAT AGTGGGGATG GCGAAGCGTG TAATGAGATG CTTCCAAAAA ACGATGGAAC 63 6 0 
lrrJln?i^n TI^G^GC CGAGTTGTGC CAAACCTTGT AGTTC??^ 642^ 
AGAAAGCAAG GGAAAAGAGT TTGAGAAACA AGAAAAGGCA TATGAACAAC AAAAAGACAA 6480 
ATGTGTAAAT GGAAGTAATA AGCATGATAA TGGATTTTGT GAAACACTAA CAACGTCCTC 65^0 
TAAAGCTAAA GACTTTTTAA AAACGTTAGG ACCATGTAAA CCTAATAATG TAGAGGGTAA 6600 
5n t^^^™ GATGATGATA AAACCTTTAA ACATACAAAA GATTGTGAT? ?K?StS?^ eleS 
20 ATTTAGTGTT AATTGTAAAA AAGATGAATG TGATAATTCT AAAGGAACCG ATTGCCGAAA 6720 
TAAAAATAGT ATTGATGCAA CAGATATTGA AAATGGAGTG GATTCTACTG TACTAGAAAT 6780 
GCGTGTCAGT GCTGATAGTA AAAGTGGATT TAATGGTGAT GGTTTAGAGA ATGCTTGtKg 6840 
AGGTGCTGGT ATCTTTGAAG GTATTAGAAA AGATGAATGG AAATGTCGTA ATGTAtStGG tloo 
TTATGTTGTA TGTAAACCGG AAAACGTTAA TGGGGAAGCA AAGGGAAAAC ACATtI?IS 696^ 
25 AATTAGAGCA CTGGTTAAAC GTTGGGTAGA ATATTTTTTT GAAGATTATA ATAAAATAAA 702^ 
ACATAAAATT TCACATCGCA TAAAAAATGG TGAAATATCT CCATGTATAA AAAATTGTGT 7080 
AGAAAAATGG GTAGATCAGA AAAGAAAAGA ATGGAAGGAA ATTACTGAAC GTTTCAAAGA 7140 
TCAATATAAA AATGACAATT CAGATGATGA CAATGTGAGA AGTTTTTTGG AGACCTTGAT 72 00 
^S^Z:?^'^'^ ACTGATGCAA ACGGTAAAAA TAAGGTTATA AAATTAAGTA AGTTCGGTAA 7260 
TTCTTGTGGA TGTAGTGCCA GTGCGAACGA ACAAAACAAA AATGGTGAAT ACAAGGACGC 7320 
ATGCTTAAAA AGCTTAAAGA TAAAATTGGC GAGTGCGAAA AGAAACACCA 7380 
TCAAACTAGT GATACCGAGT GTTCCGACAC ACCACAACCG CAAACCCTTG AAGACGAAAC 7440 
TTTGGATGAT GATATAGAAA CAGAGGAGGC GAAGAAGAAC ATGATGCCGA AAATTTGTGA 7500 
AAATGTGTTA AAAACAGCAC AACAAGAGGA TGAAGGCGGT TGTGTCCCAG CAGAAAATAG 756 0 
TGAAGAACCG GCAGCAACAG ATAGTGGTAA GGAAACCCCC GAACAAACCC CCGTTCTCAA 762 0 
GAAGCAGTAC CGGAACCACC ACCTCCACCC CCACAGGAAA AAGCCCCGGC 7680 
ACCAATACCC CAACCACAAC CACCAACCCC CCCCACACAA CTCTTGGATA ATCCCCACGT 774 0 
TCTAACCGCC CTGGTGACCT CCACCCTCGC CTGGAGCGTT GGCATCGGTT TTGCTACATT 780 0 
CACTTATTTT TATCTAAAGG TAAATGGAAG TATATATATG GGGATGTGGA TGTATGTGGA 786 0 
TGTATGTGAA TGTATGTGGA TGTATGTGGA TGTATGTGGA TGTGTTTTAT GGATATGTAT 7 92 0 
TTGTGATTAT GTTTGGATAT ATATATATAT ATATATATGT TTATGTATAT GTGTTTTTGG 7 98 0 
ATATATATAT GTGTATGTAT ATGATTTTCT GTATATGTAT TTGTGGGTTA AGGATATATA 8 04 0 
TATATGGATG TACTTGTATG TGTTTTATAT ATATATTTTA TATATATGTA TTTATATTAA 8100 
AAAAGAAATA TAAAAACAAA TTTATTAAAA TGAAAAAAAG AAAAATGAAA TATAAAAAAA 816 0 
45 AATTTATTAA AATAAAAAAA AAAAAAAAAA AAAAGGAGAA AAATTTTTTA AAAAATAATA 822 0 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2710 amino acids 
50 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



Asn Val Met Val Glu Leu Ala Lys Met Gly Pro Lys Glu Ala Ala Gly 
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15 10 15 

Gly Asp Asp lie Glu Asp Glu Ser Ala Lys His Met Phe Asp Arg He 

20 25 30 

Gly Lys Asp Val Tyr Asp Lys Val Lys Glu Glu Ala Lys Glu Arg T51y 

.35 40 45 

Lys Gly Leu Gin Gly Arg Leu Ser Glu Ala Lys Phe Glu Lys Asn Glu 

50 55 60 

Ser Asp Pro Gin Thr Pro Glu Asp Pro Cys Asp Leu Asp His Lys Tvr 
^5 70 75 80 

His Thr Asn Val Thr Thr Asn Val He Asn Pro Cys Ala Asp Ara Ser 

85 90 95 

Asp Val Arg Phe Ser Asp Glu Tyr Gly Gly Gin Cys Thr His Asn Arg 

100 105 110 

lie Lys Asp Ser Gin Gin Gly Asp Asn Lys Gly Ala Cys Ala Pro Tvr 
15 115 120 125 

Arg Arg Leu His Val Cys Asp Gin Asn Leu Glu Gin He Glu Pro He 

130 135 140 

Lys He Thr Asn Thr His Asn Leu Leu Val Asp Val Cys Met Ala Ala 
145 150 155 160 

Lys Phe Glu Gly Gin Ser He Thr Gin Asp Tyr Pro Lys Tyr Gin Ala 

165 170 175 

Thr Tyr Gly Asp Ser Pro Ser Gin He Cys Thr Met Leu Ala Arg Ser 

180 185 190 

Phe Ala Asp He Gly Asp He Val Arg Gly Arg Asp Leu Tyr Leu Gly 
25 195 200 205 

Asn Pro Gin Glu He Lys Gin Arg Gin Gin Leu Glu Asn Asn Leu Lys 

210 215 220 

Thr He Phe Gly Lys He Tyr Glu Lys Leu Asn Gly Ala Glu Ala Arg 
225 230. 235 240 

Tyr Gly Asn Asp Pro Glu Phe Phe Lys Leu Arg Glu Asp Trp Trp Thr 

245 250 255 

Ala Asn Arg Glu Thr Val Trp Lys Ala He Thr Cys Asn Ala Trp Gly 

260 265 270 

Asn Thr Tyr Phe His Ala Thr Cys Asn Arg Gly Glu Arg Thr Lys Gly 
35 275 280 285 

Tyr Cys Arg Cys Asn Asp Asp Gin Val Pro Thr Tyr Phe Asp Tyr Val 

290 295 300 

Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu Asp Phe Cys Arg 
305 310 315 320 

Lys Lys Asn Lys Lys He Lys Asp Val Lys Arg Asn Cys Arg Gly Lys 

325 330 335 

Asp Lys Glu Asp Lys Asp Arg Tyr Cys Ser Arg Asn Gly Tyr Asp Cys 

340 345 350 

Glu Lys Thr Lys Arg Ala He Gly Lys Leu Arg Tyr Gly Lys Gin Cys 
45 355 360 365 

He Ser Cys Leu Tyr Ala Cys Asn Pro Tyr Val Asp Trp He Asn Asn 

370 375 380 

Gin Lys Glu Gin Phe Asp Lys Gin Lys Lys Lys Tyr Asp Glu Glu He 
385 390 395 400 

Lys Lys Tyr Glu Asn Gly Ala Ser Gly Gly Ser Arg Gin Lys Arg Asp 

405 410 415 

Ala Gly Gly Thr Thr Thr Thr Asn Tyr Asp Gly Tyr Glu Lys Lys Phe 

420 425 430 

Tyr Asp Glu Leu Acn Lye Ser Glu Tyr Arg Thr acr^ i_,\rs P'^f=^ t.^^u 

55 435 440 " 4"45 ^ 

Glu Lys Leu Ser Asn Glu Glu He Cys Thr Lys Val Lys Asp Glu Glu 

450 455 460 

Gly Gly Thr He Asp Phe Lys Asn Val Asn Ser Asp Ser Thr Ser Gly 
465 470 475 480 

Ala Ser Gly Thr Asn Val Glu Ser Gin Gly Thr Phe Tyr Arg Ser Lys 

485 490 495 

Tyr Cys Gin Pro Cys Pro Tyr Cys Gly Val Lys Lys Val Asn Asn Gly 

500 505 510 

Gly Ser Ser Asn Glu Trp Glu Glu Lys Asn Asn Gly Lys Cys Lys Ser 
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515 520 525 

Gly Lys Leu Tyr Glu Pro Lys Pro Asp Lys Glu Gly Thr Thr lie Thr 

530 535 540 

He Leu Lys Ser Gly Lys Gly His Asp Asp He Glu Glu Lys Leu Asn 
545 550 555 560 

Lys Phe Cys Asp Glu Lys Asn Gly Asp Thr He Asn Ser Gly Gly Ser 

565 570 575 

Gly Thr Gly Gly Ser Gly Gly Gly Asn Ser Gly Arg Gin Glu Leu Tyr 

580 585 590 

Glu Glu Trp Lys Cys Tyr Lys Gly Glu Asp Val Val Lys Val Gly His 

595 600 605 

Asp Glu Asp Asp Glu Glu Asp Tyr Glu Asn Val Lys Asn Ala Gly Gly 

610 615 620 

Leu Cys He Leu Lys Asn Gin Lys Lys Asn Lys Glu Glu Gly Glv Asn 
15 625 630 635 640 

Thr Ser Glu Lys Glu Pro Asp Glu He Gin Lys Thr Phe Asn Pro Phe 

645 650 655 

Phe Tyr Tyr Trp Val Ala His Met Leu Lys Asp Ser He His Trp Lys 

660 665 670 

Lys Lys Leu Gin Arg Cys Leu Gin Asn Gly Asn Arg He Lys Cys Glv 

675 680 685 

Asn Asn Lys Cys Asn Asn Asp Cys Glu Cys Phe Lys Arg Trp He Thr 

690 695 700 

Gin Lys Lys Asp Glu Trp Gly Lys He Val Gin His Phe Lys Thr Gin 
25 705 710 715 720 

Asn He Lys Gly Arg Gly Gly Ser Asp Asn Thr Ala Glu Leu He Pro 

725 730 735 

Phe Asp His Asp Tyr Val Leu Gin Tyr Asn Leu Gin Glu Glu Phe Leu 
740 745 750 

^0 Lys Gly Asp Ser Glu Asp Ala Ser Glu Glu Lys Ser Glu Asn Ser Leu 

755 760 765 

Asp Ala Glu Glu Ala Glu Glu Leu Lys His Leu Arg Glu He He Glu 

770 775 780 

Ser Glu Asp Asn Asn Gin Glu Ala Ser Val Gly Gly Gly Val Thr Glu 
35 785 790 795 800 

Gin Lys Asn He Met Asp Lys Leu Leu Asn Tyr Glu Lys Asp Glu Ala 

805 810 815 

Asp Leu Cys Leu Glu He His Glu Asp Glu Glu Glu Glu Lys Glu Lys 

820 825 830 

Gly Asp Gly Asn Glu Cys He Glu Glu Gly Glu Asn Phe Arg Tyr Asn 

835 840 845 

Pro Cys Ser Gly Glu Ser Gly Asn Lys Arg Tyr Pro Val Leu Ala Asn 

850 855 860 

Lys Val Ala Tyr Gin Met His His Lys Ala Lys Thr Gin Leu Ala Ser 
45 865 870 875 880 

Arg Ala Gly Arg Ser Ala Leu Arg Gly Asp He Ser Leu Ala Gin Phe 

885 890 895 

Lys Asn Gly Arg Asn Gly Ser Thr Leu Lys Gly Gin He Cys Lys He 

900 905 910 

Asn Glu Asn Tyr Ser Asn Asp Ser Arg Gly A-sn Ser Gly Gly Pro Cys 

915 920 925 

Thr Gly Lys Asp Gly Asp His Gly Gly Val Arg Met Arg He Gly Thr 

930 935 940 

Glu Trp Ser Asn He Glu Gly Lys Lys Gin Thr Ser Tyr Lys Asn Val 
55 945 950 955 960 

Phe Leu Pro Pro Arg Arg Glu His Met Cys Thr Ser Asn Leu Glu Asn 

965 970 975 

Leu Asp Val Gly Ser Val Thr Lys Asn Asp Lys Ala Ser His Ser Leu 
980 985 990 
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Leu Gly Asp Val Gin Leu Ala Ala Lys Thr Asp Ala Ala Glu He He 

995 1000 1005 

Lys Arg Tyr Lys Asp Gin Asn Asn He Gin Leu Thr Asp Pro He Gin 
1010 1015 1020 
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Gin Lys Asp Gin Glu Ala Met Cys Arg Ala Val Arg Tyr Ser Phe Ala 
=^025 1030 1035 1040 

Asp Leu Gly Asp lie He Arg Gly Arg Asp Met Trp Asp Glu Asp Lys 

1045 1050 105-5 

5 Ser Ser Thr Asp Met Glu Thr Arg Leu He Thr Val Phe Lys Asn He 

1060 1065 1070 

Lys Glu Lys His Asp Gly He Lys Asp Asn Pro Lys Tyr Thr Glv Asp 

^ 1075 1080 1085 

Glu Ser Lys Lys Pro Ala Tyr Lys Lys Leu Arg Ala Asp Trp Trp Glu 
10 1090 1095 1100 

Ala Asn Arg His Gin Val Trp Arg Ala Met Lys Cys Ala Thr Lvs Glv 
1105 1110 1115 1120 

He He Cys Pro Gly Met Pro Val Asp Asp Tyr He Pro Gin Arg Leu 
_ 1125 1130 1135 

Arg Trp Met Thr Glu Trp Ala Glu Trp Tyr Cys Lys Ala Gin Ser Gin 

1140 1145 . 1150 

Glu Tyr Asp Lys Leu Lys Lys He Cys Ala Asp Cys Met Ser Lys Gly 

1155 1160 1165 

Asp Gly Lys Cys Thr Gin Gly Asp Val Asp Cys Gly Lys Cys Lys Ala 
20 1170 1175 1180 

- Ala Cys Asp Lys Tyr Lys Glu Glu He Glu Lys Trp Asn Glu Gin Trp 
1185 1190 1195 1200 

Arg Lys He Ser Asp Lys Tyr Asn Leu Leu Tyr Leu Gin Ala Lys Thr 

1205 1210 1215 

Thr Ser Thr Asn Pro Gly Arg Thr Val Leu Gly Asp Asp Asp Pro Asp 

1220 1225 1230 

Tyr Gin Gin Met Val Asp Phe Leu Thr Pro He His Lys Ala Ser He 

1235 1240 1245 

Ala Ala Arg Val Leu Val Lys Arg Ala Ala Gly Ser Pro Thr Glu He 
30 1250 1255 1260 

Ala Ala Ala Ala Pro He Thr Pro Tyr Ser Thr Ala Ala Gly Tyr He 
1265 1270 1275 1280 

His Gin Glu He Gly Tyr Gly Gly Cys Gin Glu Gin Thr Gin Phe Cys 

. . ^2^^ 1290 1295 

Glu Lys Lys His Gly Ala Thr Ser Thr Ser Thr Thr Lys Glu Asn Lys 

1300 1305 1310 

Glu Tyr Thr Phe Lys Gin Pro Pro Pro Glu Tyr Ala Thr Ala Cys Asp 

1315 1320 1325 

Gys. He Asn Arg Ser Gin Thr Glu Glu Pro Lys Lys Lys Glu Glu Asn 
^0 1330 1335 1340 

Val Glu Ser Ala Cys Lys He Val Glu Lys He Leu Glu Gly Lys Asn 
1345 1350 1355 1360 

Gly Arg Thr Thr Val Gly Glu Cys Asn Pro Lys Glu Ser Tyr Pro Asp 

1365 1370 1375 

Trp Asp Cys Lys Asn Asn He Asp He Ser His Asp Gly Ala Cys Met 

1380 1385 1390 

Pro Pro Arg Arg Gin Lys Leu Cys Leu Tyr Tyr He Ala His Glu Ser 

1395 1400 1405 

Gin Thr Glu Asn He Lys Thr Asp Asp Asn Leu Lys Asp Ala Phe He 
50 1410 1415 ' 1420 

Lys Thr Ala Ala Ala Glu Thr Phe Leu Ser Trp Gin Tyr Tyr Lys Ser 
1425 1430 1435 1440 

Lys Asn Asp Ser Glu Ala Lys He Leu Asp Arg Gly Leu He Pro Ser 

"•445 1450 1455 

Gin Phe Leu Arg Ser Met Met Tyr Thr Phe Gly Asp Tyr Arg Asp He 

1460 1465 1470 

Cys Leu Asn Thr Asp He Ser Lys Lys Gin Asn Asp Val Ala Lys Ala 

1475 1480 1485 

Lys Asp Lys He Gly Lys Phe Phe Ser Lys Asp Gly Ser Lys Ser Pro 
60 1490 1495 1500 

Ser Gly Leu Ser Arg Gin Glu Trp Trp Lys Thr Asn Gly Pro Glu He 
1505 1510 1515 1520 

Trp Lys Gly Met Leu Cys Ala Leu Thr Lys Tyr Val Thr Asp Thr Asp 

1525 1530 1535 
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Asn Lys Arg Lys^ lie Lys Asn Asp Tyr Ser Tyr Asp Lys Val Asn Gin 
Ser Gin As^n Gly Asn Pro Ser Leu Glu Glu Phe Ala Ala Lys^Pro Gin 

^ 1570^''^ "^""^ ^isil^ ^^^ ^^'^ ^^'^ Cys^lla GluVrg 

Gin Lys Lys Glu Asn He He Lys Asp Ala Cys As^n Glu He Asn Ser 

1590 1595 -gQQ 

Thr Gin Gin Cys Asn^Asp Ala Lys His Arg Cys Asn Gin Ala Cys Arg 

Ala Tyr Gin Glu Tyr Val Glu Asn Lys Ly^ L°ys Glu Phe Ser Gly^Gln 

1625 1630 
Thr Asn As^n Phe Val Leu Lys Ala Asn Val Gin Pro Gin Asp Pro Glu 

15 Tyr Lys Gly Tyr Glu Tyr Lys^Asp Gly Val Gin Pro Ile^ Gin Gly Asn 

Glu Tyr Leu Leu Gin Lys Cys Asp Asn Asn Lys Cys^ Ser Cys Met Asp 
, 1670 1675 isan 

2^ Gly Asn Val Leu Ser^Val Ser Pro Lys Glu Lys Pro Phe Gly Lys Tyr 

Ala His Lys Tyr Pro Glu Lys Cys Asp Cys Tyr Gin Gly Lys Hi^^vJl 

1700 1705 1710 

Pro Ser lie P^o Pro Pro Pro Pro Pro Val Gin Pro Gin Pro Glu Ala 

25 Pro Thr val Thr Val Asp Val Cys Ser He Val Lys Thr llu Phe Lys 

1730 1735 y 

Asp Thr Asn Asn Phe. Ser Asp Ala Cys Gly Leu Lys Tyr Gly Lys Thr 
*^ 1750 1755 

30 ^""^ ser Ser Trp^Lys Cys He Pro Ser Asp Thr Lys Ser Gly All° 

Gly Ala Thr Thr Gly Lys Ser Gly Ser Asp Ser Gly Ser He Cy^'^He 

1780 1785 1790 

Pro Pro Arg Arg Arg Arg Leu Tyr Val Gly Lys Leu Gin Glu Trp Ala 

1795 1800 1805 

Thr Ala Leu Pro Gin Gly Glu Gly Ala Ala Pro Ser His Ser Ara Ala 

1810 1815 1820 

Asp Asp Leu Arg Asn Ala Phe He Gin Ser Ala Ala He Glu Thr Phe 

1835 1840 
Phe Leu Trp Asp Arg Tyr Lys Glu Glu Lys Lys Pro Gin Gly Asp Gly 

1845 1850 1855 

Ser Gin Gin Ala Leu Ser Gin Leu Thr Ser Thr Tyr Ser Asp Asp Glu 

I860 1865 lg70 

Glu Asp Pro Pro Asp Lys Leu Leu Gin Asn Gly Lys He Pro Pro Aso 
1875 1880 1885 

Leu^Arg Leu Met Phe Tyr Thr Leu Gly Asp Tyr Arg Asp He Leu 

YfL"^^ ^^"^ ■^^'^ ^^"^ Asn Thr A°sn Gly Ser Asn 

19°5 1910 1915 3^920 

Asn Asn Asn He Val Leu Glu Ala Ser Gly Asn Lys Glu Asp Met Gin 

1925 1930 1935 

Lys He Gin Glu Lys He Glu Gin He Leu Pro Lys Asn Gly Gly Thr 

1940 1945 1950 

Pro Leu Val Pro Lys Ser Ser Ala Gin Thr Pro Asp Lys Trp Trp Asn 

1955 I960 1965 

Glu His Ala Glu Ser He Trp Lys Gly Met He Cys Ala Leu Thr Tyr 

1970 1975 1980 

Thr Glu Lys Asn Pro Asp Thr Ser Ala Arg Gly Asp Glu Asn Lys He 
i?^^ 1990 1995 2000 

on ^■'■^ Asp Glu Val Tyr Glu Lys Phe Phe Gly Ser Thr Ala Asn 

2005 2010 2015 

Lys His Gly Thr Ala Ser Thr Pro Thr Gly Thr Tyr Lys Thr Gin Tyr 

2020 2025 2030 

Asp Tyr Glu Lys Val Lys Leu Glu Asp Thr Ser Gly Ala Lys Thr Pro 
2035 2040 2045 
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Ser Ala Ser Ser Asp Thr Pro Leu Leu Ser Asp Phe Val Leu Arg Pro 

2050 2055 2060 

Pro Tyr Phe Arg Tyr Leu Glu Glu Trp Gly Gin Asn Phe Cys Lys Lys 
2065 2070 2075 .2.080 

5 Arg Lys His Lys Leu Ala Gin lie Lys His Glu Cys Lys Val Glu Glu 

2085 2090 2095 

Asn Gly Gly Gly Ser Arg Arg Gly Gly He Thr Arg Gin Tyr Ser Gly 

2100 2105 2110 

Asp Gly Glu Ala Cys Asn Glu Met Leu Pro Lys Asn Asp Gly Thr Val 
10 2115 2120 2125 

Pro Asp Leu Glu Lys Pro Ser Cys Ala Lys Pro Cys Ser Ser Tyr Arg 

2130 2135 2140 

Lys Trp He Glu Ser Lys Gly Lys Glu Phe Glu Lys Gin Glu Lys Ala 
2145 2150 2155 2160 

15 Tyr Glu Gin Gin Lys Asp Lys Cys Val Asn Gly Ser Asn Lys His Asp 

2165 2170 2175 

Asn Gly Phe Cys Glu Thr Leu Thr Thr Ser Ser Lys Ala Lys Asp Phe 

2180 2185 2190 

Leu Lys Thr Leu Gly Pro Cys Lys Pro Asn Asn Val Glu Gly Lys Thr 
20 2195 2200 2205 

He Phe Asp Asp Asp Lys Thr Phe Lys His Thr Lys Asp Cys Asp Pro 

2210 2215 2220 

Cys Leu Lys Phe Ser Val Asn Cys Lys Lys Asp Glu Cys Asp Asn Ser 
2225 2230 2235 2240 

Lys Gly Thr Asp Cys Arg Asn Lys Asn Ser He Asp Ala Thr Asp He 

2245 2250 2255 

Glu Asn Gly Val Asp Ser Thr Val Leu Glu Met Arg Val Ser Ala Asp 

2260 2265 2270 

Ser Lys Ser Gly Phe Asn Gly Asp Gly Leu Glu Asn Ala Cys Arcr Gly 
30 2275 2280 2285 

Ala Gly He Phe Glu Gly He Arg Lys Asp Glu Trp Lys Cys Arg Asn 

2290 2295 2300 

Val Cys Gly Tyr Val Val Cys Lys Pro Glu Asn Val Asn Gly Glu Ala 
2305 2310 2315 2320 

35 Lys Gly Lys His He He Gin He Arg Ala Leu Val Lys Arg Trp Val 

. 2325 2330 2335 

Glu Tyr Phe Phe Glu Asp Tyr Asn Lys He Lys His Lys He Ser His 

2340 2345 2350 

Arg He Lys Asn Gly Glu He Ser Pro Cys He Lys Asn Cys Val Glu 
40 2355 2360 2365 

Lys Trp Val Asp Gin Lys Arg Lys Glu Trp Lys Glu He Thr Glu Arg 

2370 2375 2380 

Phe Lys Asp Gin Tyr Lys Asn Asp Asn Ser Asp Asp Asp Asn Val Arg 
2385 2390 2395 2400 

Ser Phe Leu Glu Thr Leu He Pro Gin He Thr Asp Ala Asn Ala Lys 

2405 2410 2415 

Asn Lys Val He Lys Leu Ser Lys Phe Gly Asn Ser Cys Gly Cys Ser 

2420 2425 2430 

Ala Ser Ala Asn Glu Gin Asn Lys Asn Gly Glu Tyr Lys Asp Ala He 
50 2435 2440 ' 2445 

Asp Cys Met Leu Lys Lys Leu Lys Asp Lys He Gly Glu Cys Glu Lys 

2450 2455 2460 

Lys His His Gin Thr Ser Asp Thr Glu Cys Ser Asp Thr Pro Gin Pro 
2465 2470 2475 2480 

55 Gin Thr Leu Glu Asp Glu Thr Leu Asp Asp Asp lie tiiu Thr ciiu Glu 

2485 2490 2495 

Ala Lys Lys Asn Met Met Pro Lys He Cys Glu Asn Val Leu Lys Thr 

2500 2505 2510 

Ala Gin Gin Glu Asp Glu Gly Gly Cys Val Pro Ala Glu Asn Ser Glu 
60 2515 2520 2525 

Glu Pro Ala Ala Thr Asp Ser Gly Lys Glu Thr Pro Glu Gin Thr Pro 

2530 2535 2540 

Val Leu Lys Pro Glu Glu Glu Ala Val Pro Glu Pro Pro Pro Pro Pro 
2545 2550 2555 2560 
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Pro Gin Glu Lys Ala Pro Ala Pro He Pro Gin Pro Gin Pro Pro Thr 

2565 2570 2575 

Pro Pro Thr Gin Leu Leu Asp Asn Pro His Val Leu Thr Ala Leu Val 

2580 2585 2590 • - 

Thr Ser Thr Leu Ala Trp Ser Val Gly He Gly Phe Ala Thr Phe Thr 

2595 2600 2605 

Tyr Phe Tyr Leu Lys Val Asn Gly Ser He Tyr Met Gly Met Trp Met 

2610 2615 2620 

Tyr Val Asp Val Cys Glu Cys Met Trp Met Tyr Val Asp Val Cvs Glv 
2625 2630 2635 2640 

Cys Val Leu Trp He Cys He Cys Asp Tyr Val Trp He Tyr He Tyr 

2645 2650 2655 

He Tyr He Cys Leu Cys He Cys Val Phe Gly Tyr He Tyr Val Tyr 

2660 2665 2670 

Val Tyr Asp Phe Leu Tyr Met Tyr Leu Trp Val Lys Asp He Tyr He 

2675 2680 2685 

Trp Met Tyr Leu Tyr Val Phe Tyr He Tyr He Leu Tyr He Cys He 

2690 2695 2700 

Tyr He Lys Lys Glu He 
2705 2710 



(2) INFORMATION FOR SEQ ID NO:13: , 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19124 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

ACATTTTTTC GTAATATATA TATATATATA TATATATAAT TCTCTTTTTC TAATATATAT 
6 0 

ATCCTTCTAT TTTCGATTTT TTCATTTTTT TCCAGTATTA ATTTATTTAT TTATTTGTGA 12 0 
TATTTTATAA TATATTATTT AAATGTGTAT TTATATATGT GTTTTATTTT TGTTATTAAT 18 0 
TTGAATAATC CGAGCGAAAA AAAATATATA ATCTCATATA AAAATTATTT ATAATACAAT 24 0 
ATTATATAGT TTCCTATTAA AATAAATTAA TATAATATAC AATAATATTT CTTGTTATTT 3 00 
TTATAAATAT AACTAATTTC TTATTTTTAT TTAACTTTAT TCCTTTTTAA TTTCTTAATT 3 60 
CTTTTATGCA AACAAAAAAC ATAAAGTAAT TCTACATATC AACAAAAAAA AAAAAAAAAA 42 0 
AAAAAAAAAA ATTTATTATA ATATAATAAA AAATATAAAG ACATACGTTC ACTTATTATT 4 8 0 
^ ATAAATGATT TATTACGATT AAAACATATT GAGATTATAA TAATATAATT TAACATAGAA 54 0 
AGAGTTAAGA ATACATTTTT tTTTTTTTTT TGATATGTAA TTCAACATAT ATATATATAT 600 
ATATCTTTTT AATTTAATTA AATAAAATTC CTTATTATTC ATATTGTTTC TTTTATCACA 66 0 
TGTGAAATAT TAAAAATAAT TTTCGATTTT' AT CG AT AT AT TTATGTCGTT TATATACTTA 72 0 
TATAGGTCTT TATAACTATT GATTAATAGA AGGTAATAGC CTAATAATAT AAATACTCGT 78 0 
ATTTATAAAT TCATTTATAT ATTTCAAATA TATTTCGATG GTTTATTTTC AAATACAATT 84 0 
AATTAGATTT CTTAAATATT TCTTCATTTA TTCATTTTTA TAGCATATAC ATGCACATTA 90 0 
TAAATTATTA ATAAAAAATT TTTATTTTAA TATATAATAA CAATTTTCAT ACATTACATT 96 0 
TTTCACACAA CATTTAAGTT GTCATAATGT AACACATTAA ATAATATATT ACTTATATAT 1020 
ATATAATTAT TAATTATATA TTAAATAAAA ATGTATTATC- GCCTGTATTA TCATAGTATA 1080 
TATAATGTTG TATAACGCTT CAAAATATAT ATAATAATAT AATTAAAAAT ATATATATAG 1140 
TAATTAATTA TTTTGTTATG TTATGTAATA ATGCAATTAA TATAAGATAA AATTCTATAG 1200 
CTATTATTTA AAATATATAT ATATATATAT ATATATATAT ATATTAGTAT ATGTTATCAA 126 0 
AATATTATAA TATGTAAATT ATTAATAAAA TATATTTGTA TAACATACAA GACTAAAGAA 132 0 
AACTATACAA TCTGGTATCT AATAGTATAT ATATATAATA TCTTTTTTAT TTAATTGTTC 138 0 
TCTCTTTTTT TTTTTTTTAA ATAATAATAA ATATTAATAT ATTTTTTTTC ATAATTATAT 1440 
GATTTAGTAT TTTAATAATA AATAAATCTT TTAAAAAACT TCAAAACATT TTTGCATAAA 1500 
ATAATATTAA TATTAGTAAC CACCTAGATA AATTAGAGAG AAACGTAGAA CATACCAAAA 1560 
AAAATTAGAA CAAAAAGAAT ATTACAAAAA ATAATAAAAT TAAATTATTT CTTTACTATT 1620 
AATTTAAAGT TTTTTTTCAT ATCATATATT ATGATACACA ATGTTTGTTG TTAAATGTTT 1680 
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TATATACATG CAATGATATG TTTCTGTTGG 
AAATGTATTG TACACCTTTA GCAACTATTA 
GAAAATATGT TATATTATTA CAATATCTTA 
AAAATTACAA TTGTAATTAA TCGTATGACA 
5 CAAAATTATA AAAAATATGG AAATGTTTTG 
TTATTTTATT ATTTATTTTT ttTTTTTTTT 
AAGTAAAAAA TATATATATT TACATAATGG 
ATAATAATAT TTTAGATTAA ACATATGTAA 
ATATATATAT TAATTATTAA GTTATAGATT 

10 AAAATGAAAG TTCACTACAG TAATATATTA 
ATATCACGTA TGCACTAAAT AATGACAATA 
GTAAATAAAA AAATATACAT ATATACAAAA 
GATAAATATC CAGAAGAACT ATTACATCAC 
ACAACCACTA GGTTATTATG CGAATGTGAC 

15 GAAATGATAT TAGTGATGGA. AAATTTCAAT 
AATGAACGCA TGCAAGAAAA ACGAAAAATA 
AAAATTATTT TAAAAGATAA AATCGAAAAG 
ACGAATATAA AGACTGAGGA TATACCTACT 
GTGGAAAAAA CGTGTTTGAA ATGTGGAGGT 

20 GGTTTATTAG GAGAAATAGG TGGACTTGTT 
AAAGCTTTTC TTACTTTTGC TCAAAAGGAA 
ACTGCTCGTA TTGATACAGT TATTTAAGGA 
AATGGTTCTA CGTTGGGGAA AGTTATTACC 
ACTACGGCAC TATATAATGA ATATGTAAGC 

25 AAATTAATTT GTGCTTTTGG GATGAGAGAC 
CGAGACGTTA TAGGATCAAG TGTAAAAGGA 
CAAGCTGCTG AGACAGCTGC TAACGAAACT 
AAAATAACAT CTGCAGGTGC TAATTTACAC 
TTGGTTATAG TTTTGGTTAT GGTAATTATT 

30 AAAATGAAGA AAAAATTGCA ATATATAAAA 
CTATTAGCGG TAATTTAAAG TATTGTGAAT 
AATTAATTTT TTTTTATAAT ATTATATTTT 
TATTATATGA TTATTTAATT ATTATACTTA 
ATATATGTAT CTATCTATCT ATCTATCTAT 

35 TTATTATTAT TAGATGCATA TTAGTGATGA 
ACATAATAAT ATATTAAATT AATAGAACTT 
AGAAATTTGA AAAAGTAATT TACACATGAT 
TATTTATTTA TAAAAATTGT TTAATATAAG 
TTAGCTTTCC ATTATACAAA TATATATTTC 

40 TAAAAAAAGT ATAATATAAT AAAATATCTA 
• AAATTTTAAT TTTATACGAT AGAATAAATT 
AAGAACCTAT TACAATATAG TAAC7VACTGG 
TGTAAAAGGA TAGTTGTTAA AGGCTTTTTT 
TATAATAGAT ATCTTAACAT ACAACTTTGC 

45 AGAT^TATTA TAAATAATAT TATAAAAAAT 
TATTAATTTA ATTTTATTTT ATTGTTCTAA 
TCTAATATAA TTAAGATATT TCTAATATTA 
AGAATAATTT TTTACTTATT TATTATAATA 
GATGACAAAA AAAAAACTTT TAAAATGGAA 

50 TAATTGGTGA AATAGTTGTA ACTTATACAA 
TAATATTGTT TATGTATCGT AATATATATT 
TTCTAATAAT ATATTCATAT GTAGTCATAG 
TATTATTGTA TATATTAAAT AAGTAACACA 
ATAATATATT TTTATGTTAT ATATTATTAG 

55 ATGAAAATTT TTGTATATGA TATAGTTATA 
ATGGAAAGCA TAAAAAATGT TACTGTAATA 
TTATCTTAAA AAGGTTCCTA TTATAACATT 
TAACTACATT TACATAATGA AATTTCGATT 
ATTATTTATA TGTGAATGCG TTCTATATAA 

60 ATAAGAAATA AATATCCTGA TTTTGTAGTT 
TATATATTAT ATATATCTTT ACAACAAGTA 
GAAAATAAAA ATAATAAAAT AAGAATACTG 
AATGTAACAT AATTACAAAT ACGTAACATG 
AGGATAAATA TAAATATTTA AAATTATATT 



AATATGTATT ATATACTTAT ATGTTCTAAT 174 0 
CTACACACAT TTTTATATAA TTTATAACAG 180 0 
ATGTGTTTTT GCAAAAATAT AAAAAACAAG 1860 
TAAAATTATA TTATATTAGA AATTAAAATT *1"92 0 
TTATATTATT TTTTTAAAAA TTTAATTATT 198 0 
GTGTTCTAAA TAAAAAGGCA AATATGATTC 2 04 0 
CAAAATAATT GTTTATTATA TTATATGACT 2100 
TTCATTTAAC AGAATAAAAT AAAATATTAT 2160 
TAATAAAAAT ATATTATACA TATGAGATTA 222 0 
TTATATGTCG TCAATTTAAG TATATTCTTA 2280 
ATAATATATA TGTAACATTT TATAATTGAT 2340 
ACATATATGA TATTTACATT CTTTTTTATA 2400 
TTCACTTCAT ATACCAAACA CGAAAAAAAT 246 0 
TTATATACGT CCATTTATGA TAATGACCCG 252 0 
AAACAGACAG AAGAAAGGTT TCATGAATAC 258 0 
TGTAAAGAAC AATGCGAAAA GGATATACAA 264 0 
GAATTAACAG AAAAGTTAGA GGCATTGGAA 2700 
TGTGTATGCG AAAAATCAGT AGCAGATAAA 2760 
ATATTGGGTG TTGGTGTGAC TCCATCTTTA 282 0 
ATAAATAATT GGACAAATAC TCCTTTTTAT 2880 
GGTATAGCTG CCGGTAAAAT TGCTAGTGAT 2 94 0 
ATAATATCAA ATTTTGATGT GCACACTATA 3 000 
GTAGAAGCTC TTAAGGATGA CACTACTCTT 3 06 0 
ATGTGTGTAA ATACGAACCC TGTCGAAGAC 3120 
GGTCTAGTTG CAGGGCAATA TGCTTCATCG 318 0 
ATTATTAGAA AAGCTGCAAA CGCTGCTTCA 324 0 
ACTTCCGGAA TGATCGAAGC CGAGTTAAGT 33 00 
AGTGCAATTA CTTACTCAGT AACTGCGATA 3 360 
TATTTAATAT TACGTTATCG TAGAAAAAAA 3420 
TTATTAAAGG AATAGATATA CGATGTCGAG 348 0 
TTTTCATTTA ATATGCTATG ATCATTTGAT 3 54 0 
TTTATACCTT GGATTCTTAC ATTGTTTTAT 3 600 
TATATATATA TATTTTTACA TTAAGATATT 3 66 0 
ATATATATAT ATATATATAT ATTATAATAA 3 72 0 
TTATAATAAT AACCTATTGA AGAGAATAGA 3780 
CATTTTTATT GTTATATGTA TATAAAAATA 3 84 0 
AATGTATTTT ATTTTATTTG TGTTGTTTTA 3 900 
TTGTTATTAT AATTTTTTAA TATGGCACCA 3 96 0 
CTCATTAGAA TCTGAATATT TATTGTATTA 4 02 0 
AGATTTTTTC TAATTTGTTT AATTTATAAT 4 080 
ATAATCAACA TATATATATG TATTCATCTT 414 0 
TTCCTTTTTA TTATAAATAA CATT^GAATG 42 00 
AATATTGATT ATAAATGTTT GTAAGATATA 42 60 
ATAATTGTAA TTAAAAAAAT ATATATAATA 4320 
TAAGCATAAA TGTCACAATA AATTTTTTTT 43 80 
AATATATTGA TTATGAGAAT ATTATTTGTG 444 0 
ATTTATATAT ATATATTTAA AAGTATTTTA 4500 
TGAAATATGC ATGGAGTATA TATAAATATT 4560 
AATATGCATA TAATAAAATA CTATATAGTA 462 0 
ACATGTTGCA TTCATAATTT AGAGATTATG 4680 
AATATAATTG TTTTTTTAGT ATGTATGGTA 4740 
TGTCAATGAA TATAAAATAT GGTATATTTA 48 00 
GAACATTATA TATAGTAATA AATAGAAGAA 4860 
TTATTATAAA CG GG AAAATT CATAATATTT ^^^2*^ 
AGTTAAAAAA AAAAAAAAAC AAGAACAAAA 4 980 
GGATAAAATA TATTATATAA AATGTTTATT 504 0 
AAAAAAAATT TGTCCCATTT TATAAATAAT 5100 
TTGTGTTTTT TTGATGAATA TTATGGACTA 5160 
TAATAATAAT TTTATTTAAA AAAATGAAAA 522 0 
CCAATAGCTT AATATAATTA TGGACTCATA 52 8 0 
ATAAGTAAAT ATTATTTTAA TCTTAATAAG 534 0 
AATAATAAGT CATATTATAC ATTTTTTAAA 54 00 
TATTATAGAA ATAATAAGAA TTTAATATTA 54 60 
TTTTTATGTC AATTTATGTT ATATTATATT 552 0 
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ATATTAACAT GATTAGTTTT TTGAAAAATA 
TTAAAATAAT AGTATTTCAT ACAAAATACT 
ATATATATAT TTATGTGTTT TTGATTGGGT 
TTCATTATAT ATTTATATGT G7VATAGATAC 
5 GTCTGTGTTA AGATAGATAT GCATTACAGT 
GTACATATAT ATAAAAAATA GATAACTAAC 
TAAAATATAT ATATATATAT ATATATAAAG 
TTTATTATAT CATCCTTTTA TTATTATAAT 
TTGTTATTAT AATATAACAA ATATAAAACA 

10 TTCTACATAT ATGCATATAT ATATATATAT 
ATATGTATGA TTTTATACTA TTTTTATACA 
AAAGATATTA TTAATATTTA TATAGTAGCA 
CATTTATATA AATATATAGA ACATGAACAT 
ATTTATAATG TGTATTTTTA CTTATTTTTT 

15 AAAATGCATG AAATACATAA AAAAATACAA 
TATAATATAA TATAATATAA TAATATATTT 
GATGCTATAT ATATTATTAT ATAATAAATT 
TAATATACTA CTTTTAATAT AATACAACAA 
TATATATGAA TATATAAATA TGATAGATAA 

20 GTCTCTTTTG TTATCTCTAA TATATATATA 
AATATACATA TATTAATGTT AATAATTAAA 
ATATGTTTGT ATTTTCGTAT TTTTTTTTTC 
ATAAAAAAAA TAATATATAT ATAATTAAAT 
ATTTCTGATT ATATTTTTTT TTTGTTAGAA 

25 ATATATATAT TTTTTTTAAA AATATATAAA 
TATTATTTTT TTAACATATA CATATATTGT 
ATATATATAT ATATACAATA TTTATATATA 
TATATACATT CACAAAAGTG TTATTATTCT 
ATACATATAT ACATACCCCC ACGTACGTAC 

30 TATGTATGCC ACGATATAAA CCACGTACCA 
AAAAATGGGG CCCAAGGAGG CTGCAGGTGG 
TATGTTTGAT AGGATAGGAA AAGATGTGTA 
TGGTAAAGGC TTGCAAGGAC GTTTGTCAGA 
ACAAACACCA GAAGATCCAT GCGATCTTGA 

35 TGTAATTAAT CCGTGCGCTG ATAGATCTGA 
ATGTACACAT AATAGAATAA AAGATAGTCA 
ATATAGGCGA TTGCATGTAT GCGATCAAAA 
AAATACTCAT AATTTATTGG TAGATGTGTG 
AACACAAGAT TATCCAAAAT ATCAAGCAAC 

40 TATGCTGGCA CGAAGTTTTG CGGACATAGG 
AGGTAATCCA CAAGAAATAA AACAAAGACA 
CGGGAAAATA TATGAAAAAT TGAATGGCGC 
TTTTAAATTA CGAGAAGATT GGTGGACTGC 
ATGTAACGCT TGGGGTAATA CATATTTTCA 

45 AGGTTACTGC CGGTGTAACG ACGACCAAGT 
TCTTCGCTGG TTCGAGGAAT GGGCAGAAGA 
AGATGTTAAA AGAAATTGTC GTGGAAAAGA 
TAATGGCTAC GATTGCGAAA AAACTAAACG 
ATGCATTAGC TGTTTGTATG CATGTAATCC 

50 ACAATTTGAC AAACAGAAAA AAAAATATGA 
ATCAGGTGGT AGTAGGCAAA AACGGGATGC 
ATATGAAAAA AAATTTTATG ACGAACTTAA 
TTTGGAAAAA TTAAGTAATG AAGAAATATG 
AATTGATTTT AAAAACGTTA ATAGTGATAG 

55 AAGTCAAGGA ACATTTTATC GTTCAAAATA 
AAAGGTAAAT AATGGTGGTA GTAGTAATGA 
GAGTGGAAAA CTTTATGAGC CTAAACCCGA 
AAGTGGTAAA GGACATGATG ATATTGAAGA 
TGGTGATACA ATAAATAGTG GTGGTAGTGG 

60 TAGACAGGAA TTGTATGAAG AATGGAAATG 
ACACGATGAG GATGACGAGG AGGATTATGA 
ATTAAAAAAC CAAAAAAAGA ATAAAGAAGA 
TGAAATCCAA AAGACATTCA ATCCTTTTTT 
TTCCATACAT TGGAAAAAAA AACTTCAGAG 



TTTAAATATC ATATAATAAT AATAAATTAG 5580 
AACTTATAAG TATATCATAT AATATTATAT 5640 
GTATATAAGG CTATAAGTAT ATATGGGTTG 5700 
ATATAAGTTA ATATATTTAT TTGTGTATAT-5-760 
TAAGGGTTAT AGTTTTTTTT TTTTTTTTTT 5820 
AATATGCATA TTACAAGAAT AATATTTGTA 5880 
ACATTAAAAC TATACTAATA GGTAATTAGT 594 0 
TTTTTTTGTT TTACTTCTTG TCGTTCTTTT 6000 
ATATCAGTAT TTGGAATATA AATAAATTTA 606 0 
ATATATATAT ATATATATAT ATATATATAT 6120 
TGCATTTTTA TATATTTTAG TATATACTTT 6180 
TATATGTATT TATATTATAA CAAATATTTT 6240 
TTTATTAATA ACTCATATTT GAATATATAT 63 00 
TATATTATAC AATAAAATTT TGAAATTCAT 63 6 0 
CAAAACAAAT GATAAAAACA TTTTTATTAA 6420 
TTCCTGTTAT TTATTTATCA TTTTTTTTTT 6480 
ATAATATATA ACAACAAAAA TTAATAATAA 654 0 
TACAAAGAAT ATGTATCTAT ATCAATTATA 6600 
TATAGATAGA GAGAAACGAA GAACATATTT 6660 
TATATAATAA ATTAAAATAA AGTCAAAAAA 672 0 
TATATAAACA CGTTGCATAT ATACTTTTTT 6780 
TCATTTATAA TTTTACTTAA TAAATAAAAC 684 0 
AGATAAATAA AGGAATACAT AAAATATAAT 6 900 
TATTTAAATT TATTATAAAT TTATTAATAT 6 96 0 
ACTAATAATT ATTATTATAT ACATATTAAA 702 0 
AATATTATAA TAGTACAACT ATTAATATAT 7080 
TTGTAATACA TAAATTATAC CTTACATATA 714 0 
TATTCTACCA TATTATAATA CTACTGTAAT 7200 
GAAACACCAC CAAACCATGT ATCACGTATG 726 0 
CGTATGACAT AATGTAATGG TGGAGTTAGC 7320 
GGATGATATT GAGGATGAAA GTGCCAAACA 73 80 
CGATAAAGTA AAAGAGGAAG CTAAAGAACG 744 0 
AGCAAAATTT GAGAAAAATG AAAGCGATCC 7500 
TCATAAATAT CATACAAATG TAACTACTAA 7560 
CGTGCGTTTT TCCGATGAAT ATGGAGGTCA 762 0 
ACAGGGTGAT AATAAAGGTG CATGTGCTCC 768 0 
TTTAGAACAG ATAGAGCCTA TAAAAATAAC 774 0 
TATGGCAGCA AAATTTGAAG GACAATCAAT 7800 
ATATGGTGAT TCTCCTTCTC AAATATGTAC 786 0 
GGACATTGTC AGAGGAAGAG ATTTGTATTT 792 0 
ACAATTAGAA AATAATTTGA AAACAATTTT 798 0 
AGAAGCACGC TACGGAAATG ATCCGGAATT 8040 
TAATCGAGAA ACAGTATGGA AAGCCATCAC 8100 
TGCAACGTGC AATAGAGGAG AACGAACTAA 816 0 
TCCCACATAT TTTGATTATG TGCCGCAGTA 822 0 
TTTTTGTAGG AAAAAAAATA AAAAAATAAA 828 0 
TAAAGAGGAT AAGGATCGAT ATTGTAGCCG 834 0 
AGCGATTGGT AAGTTGCGTT ATGGTAAGCA 84 0 0 
TTACGTTGAT TGGATAAATA ACCAAAAAGA 84 60 
TGAAGAAATA AAAAAATATG AAAATGGAGC 852 0 
AGGTGGTACA ACTACTACTA ATTATGATGG 8580 
TAAAAGTGAA TATAGAACCG TTGATAAATT 864 0 
CACAAAAGTT AAAGAGGAAG AAGGAGGAAC 8700 
TACTAGTGGT GCTAGTGGCA CTAATGTTGA 876 0 
TTGCCAACCC TGCCCTTATT GTGGAGTGAA 8820 
ATGGGAAGAG AAAAATAATG GCAAGTGCAA 888 0 
CAAAGAAGGT ACTACTATTA CAATCCTTAA 894 0 
AAAATTAAAC AAATTTTGTG ATGAAAAAAA 9000 
TACGGGTGGT AGTGGTGGTG GTAACAGTGG 9060 
TTATAAAGGT GAAGATGTAG TGAAAGTTGG 912 0 
AAATGTAAAA T^TGCAGGCG GATTATGTAT 918 0 
AGGTGGAAAT ACGTCTGAAA AGGAGCCTGA 9240 
TTACTATTGG GTTGCACATA TGTTAAAAGA 93 00 
ATGTTTACAA AATGGTAACA GAATAAAATG 9360 
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TGGAAACAAT AAATGTAATA ATGATTGTGA ATGTTTTAAA AGATGGATTA CACAAAAAAA 9420 
AGACGAATGG GGGAAAATAG TACAACATTT TAAAACGCAA AATATTAAAG GTAGAGGAGG 9480 
TAGTGACAAT ACGGCAGAAT TAATCCCATT TGATCACGAT TATGTTCTTC AATACAATTT 9540 
GCAAGAAGAA TTTTTGAAAG GCGATTCCGA AGACGCTTCC GAAGAAAAAT CCGAAAATAG -9600 
TCTGGATGCA GAGGAGGCAG AGGAACTAAA ACACCTTCGC GAAATCATTG AAAGTGAAGA 9660 
CAATAATCAA GAAGCATCTG TTGGTGGTGG CGTCACTGAA CAAAAAAATA TAATGGATAA 9720 
ATTGCTCAAC TACGAAAAAG ACGAAGCCGA TTTATGCCTA GAAATTCACG AAGATGAGGA 9780 
AGAGGAAAAA GAAAAAGGAG ACGGAAACGA ATGTATCGAA GAGGGCGAAA ATTTTCGTTA 9840 
TAATCCATGT AGTGGCGAAA GTGGTAACAA ACGATACCCC GTTCTTGCGA ACAAAGTAGC 9900 
GTATCAAATG CATCACAAGG CAAAGACACA ATTGGCTAGT CGTGCTGGTA GAAGTGCGTT 9960 
GAGAGGTGAT ATATCCTTAG CGCAATTTAA AAATGGTCGT AACGGAAGTA CATTGAAAGG 10020 
ACAAATTTGC AAAATTAACG AAAACTATTC CAATGATAGT CGTGGTAATA GTGGTGGACC 10080 
ATGTACAGGC AAAGATGGAG ATCACGGAGG TGTGCGCATG AGAATAGGAA CGGAATGGTC 1014 0 
AAATATTGAA GGAAAAAAAC AAACGTCATA CAAAAACGTC TTTTTACCTC CCCGACGAGA 10200 
1 5 ACACATGTGT ACATCCAATT TAGAAAATTT AGATGTTGGT AGTGTCACTA AAAATGATAA 10260 
GGCTAGCCAC TCATTATTGG GAGATGTTCA GCTCGCAGCA AAAACTGATG CAGCTGAGAT 10320 
AATAAAACGC TATAAAGATC AAAATAATAT ACAACTAACT GATCCAATAC AACAAAAAGA 10380 
CCAGGAGGCT ATGTGTCGAG CTGTACGTTA TAGTTTTGCC GATTTAGGAG ACATTATTCG 10440 
AGGAAGAGAT ATGTGGGATG AGGATAAGAG CTCAACAGAC ATGGAAACAC GTTTGATAAC 10500 
20 CGTATTTAAA AACATTAAAG AAAAACATGA TGGAATCAAA GACAACCCTA AATATACCGG 10560 
TGATGAAAGC AAAAAGCCCG CATATAAAAA ATTACGAGCA GATTGGTGGG AAGCAAATAG 10620 
ACATCAAGTG TGGAGAGCCA TGAAATGCGC AACAAAAGGC ATCATATGTC CTGGTATGCC 10680 
AGTTGACGAT TATATCCCCC AACGTTTACG CTGGATGACT GAATGGGCTG AATGGTATTG 1074 0 
TAAAGCGCAA TCACAGGAGT ATGACAAGTT AAAAAAAATC TGTGCAGATT GTATGAGTAA 10 800 
25 GGGTGATGGA AAATGTACGC AAGGTGATGT CGATTGTGGA AAGTGCAAAG CAGCATGTGA 1086 0 
TAAATATAAA GAGGAAATAG AAAAATGGAA TGAACAATGG AGAAAAATAT CAGATAAATA 10920 
CAATCTATTA TACCTACAAG CAAAAACTAC TTCTACTAAT CCTGGCCGTA CTGTTCTTGG 1098 0 
TGATGACGAT CCCGACTATC AACAAATGGT AGATTTTTTG ACCCCAATAC ACAAAGCAAG 11040 
TATTGCCGCA CGTGTTCTTG TTAAACGTGC TGCTGGTAGT CCCACTGAGA TCGCCGCCGC 11100 
CGCCCCGATC AGCCCCTACA GTACTGCTGC CGGATATATA CACCAGGAAA TAGGATATGG 11160 
GGGGTGCCAG GAACAAACAC AATTTTGTGA AAAAAAACAT GGTGCAACAT CAACTAGTAC 1122 0 
CACGAAAGAA AACAAAGAAT ACACCTTTAA ACAACCTCCG CCGGAGTATG CTACAGCGTG 11280 
TGATTGCATA AATAGGTCGC AAACAGAGGA GCCGAAGAAA AAGGAAGAAA ATGTAGAGAG 113 4 0 
TGCCTGCAAA ATAGTGGAGA AAATACTTGA GGGTAAGAAT GGAAGGACTA CAGTAGGTGA 114 0 0 
ATGTAATCCA AAAGAGAGTT ATCCTGATTG GGATTGCAAA AACAATATTG ACATTAGTCA 114 60 
TGATGGTGCT TGTATGCCTC CAAGGAGACA AAAACTATGT TTATATTATA TAGCACATGA 1152 0 
GAGTCAAACA GAAAATATAA AAACAGAGGA TAATTTGAAA GATGCTTTTA TTA/yU^CTGC 1158 0 
AGCAGCAGAA ACTTTTCTTT CATGGCAATA TTATAAGAGT AAGAATGATA GTGAAGCTAA 1164 0 
AATATTAGAT AGAGGCCTTA TTCCATCCCA ATTTTTAAGA TCCATGATGT ACACGTTTGG 117 0 0 
AGATTATAGA GATATATGTT TGAACACAGA TATATCTAAA AAACAAAATG ATGTAGCTAA 1176 0 
GGCAAAAGAT AAAATAGGTA AATTTTTCTC AAAAGATGGC AGCAAATCTC CTAGTGGCTT 1182 0 
ATCACGCCAA GAATGGTGGA AAACAAATGG TCCAGAGATT TGGAAAGGAA TGTTATGTGC 118 80 
CTTAACAAAA TACGTCACAG ATACCGATAA CAAAAGAAAA ATCAAAAACG ACTACTCATA 1194 0 
CGATAAAGTC AACCAATCCC AAAATGGCAA CCCTTCCCTT GAAGAGTTTG CTGCTAAACC 12000 
45 TCAATTTCTA CGTTGGATGA TCGAATGGGG AGAAGAGTTT TGTGCTGAAC GTCAGAAGAA 12 06 0 
GGAAAATATC ATAAAAGATG CATGTAATGA AATAAATTCT ACACAACAGT GTAATGATGC 1212 0 
GAAACATCGT TGTAATCAAG CATGTAGAGC ATATCAAGAA TATGTTGAAA ATAAAAAAAA 12180 
AGAATTTTCG GGACAAACAA ATAACTTTGT TCTAAAGGCA AATGTTCAGC CCCAAGATCC 1224 0 
AGAATATAAA GGATATGAAT ATAAAGACGG CGTACAACCG ATACAGGGGA ATGAGTATTT 12 3 00 
ACTGCAAAAA TGTGATAATA ATAAATGTTC TTGCATGGAT GGAAATGTAC TTTCCGTCTC 123 60 
TCCAAAAGAA AAACCTTTTG GAAAATATGC CCATAAATAT CCTGAGAAAT GTGATTGTTA 1242 0 
TCAAGGAAAA CATGTACCTA GCATACCACC TCCCCCCCCA CCTGTACAAC CACAACCGGA 124 8 0 
AGCACCAACA GTAACAGTAG ACGTTTGCAG CATAGTAAAA ACACTATTTA AAGACACAAA 1254 0 
CAATTTTTCC GACGCTTGTG GTCTAAAATA CGGCAAAACC GCACCATCCA GTTGGAAATG 126 00 
TATACCAAGT GACACAAAAA GTGGTGCTCC TCCCACCACC GGC.^AAAGTG QT'^r^Tm^T^Q ±2'^'^^ 
TGGTAGTATT TGTATCCCAC CCAGGAGGCG. ACGATTATAT GTGGGGAAAC TACAGGAGTG 12720 
GGCTACCGCG CTCCCACAAG GTGAGGGCGC CGCGCCGTCC CACTCACGCG CCGACGACTT 127 80 
GCGCAATGCG TTCATCCAAT CTGCTGCAAT AGAGACTTTT TTCTTATGGG ATAGATATAA 12 84 0 
AGAAGAGAAA AAACCACAGG GTGATGGGTC ACAACAAGCA CTATCACAAC TAACCAGTAC 12900 
ATACAGTGAT GACGAGGAGG ACCCCCCCGA CAAACTGTTA CAAAATGGTA AGATACCCCC 12 96 0 
CGATTTTTTG AGATTAATGT TCTATACATT AGGAGATTAT AGGGATATTT TAGTACACGG 13 02 0 
TGGTAACACA AGTGACAGTG GTAACACAAA TGGTAGTAAC AACAACAATA TTGTGCTTGA 13 080 
AGCGAGTGGT AACAAGGAGG ACATGCAAAA AATACAAGAG AAAATAGAAC AAATTCTCCC 1314 0 
AAAAAATGGT GGCACACCTC TTGTCCCAAA ATCTAGTGCC CAAACACCTG ATAAATGGTG 132 00 
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GAATGAACAC GCCGAATCTA TCTGGAAAGG TATGATATGT GCATTGACAT ATACAGAAAA 1326 0 
GAACCCTGAC ACCAGTGCAA GAGGCGACGA AAACAAAATA GAAAAGGATG ATGAAGTGTA 13 32 0 
CGAGAAATTT TTTGGCAGCA CAGCCGACAA ACATGGCACA GCCTCAACCC CAACCGGCAG 133 80 
ATACAAAACC CAATACGACT ACGAAAAAGT CAAACTTGAG GATACAAGTG GTGCCAAAAC IS 44 0 
CCCCTCAGCC TCTAGTGATA CACCCCTTCT CTCCGATTTC GTGTTACGCC CCCCCTACTT 13 500 
CCGTTACCTT GAAGAATGGG GTCAAAATTT TTGTAAAAAA AGAAAGCATA AATTGGCACA 13 56 0 
AATAAAACAT GAGTGTAAAG TAGAAGAAAA TGGTGGTGGT AGTCGTCGTG GTGGTATAAC 13 62 0 
AAGACAATAT AGTGGGGATG GCGAAGCGTG TAATGAGATG CTTCCAAAAA ACGATGGAAC 13 680 
TGTTCCGGAT TTAGAAAAGC CGAGTTGTGC CAAACCTTGT AGTTCTTATA GAAAATGGAT 13740 
AGAAAGCAAG GGAAAAGAGT TTGAGAAACA AGAAAAGGCA TATGAACAAC AAAAAGACAA 13 800 
ATGTGTAAAT GGAAGTAATA AGCATGATAA TGGATTTTGT GAAACACTAA CAACGTCCTC 13 86 0 
TAAAGCTAAA GACTTTTTAA AAACGTTAGG ACCATGTAAA CCTAATAATG TAGAGGGTAA 13 92 0 
AACAATTTTT GATGATGATA AAACCTTTAA ACATACAAAA GATTGTGATC CATGTCTTAA 13 98 0 
ATTTAGTGTT AATTGTAAAA AAGATGAATG TGATAATTCT AAAGGAACCG ATTGCCGAAA 14 040 
15 TAAAAATAGT ATTGATGCAA CAGATATTGA AAATGGAGTG GATTCTACTG TACTAGAAAT 14100 
GCGTGTCAGT GCTGATAGTA AAAGTGGATT TAATGGTGAT GGTTTAGAGA ATGCTTGTAG 1416 0 
AGGTGCTGGT ATCTTTGAAG GTATTAGAAA AGATGAATGG AAATGTCGTA ATGTATGTGG 1422 0 
TTATGTTGTA TGTAAACCGG AAAACGTTAA TGGGGAAGCA AAGGGAAAAC ACATTATACA 1428 0 
AATTAGAGCA CTGGTTAAAC GTTGGGTAGA ATATTTTTTT GAAGATTATA ATA7VAATAAA 14340 
ACATAAAATT TCACATCGCA TAAAAAATGG TGAAATATCT CCATGTATAA AAAATTGTGT 144 0 0 
AGAAAAATGG GTAGATCAGA AAAGAAAAGA ATGGAAGGAA ATTACTGAAC GTTTCAAAGA 144 6 0 
TCAATATAAA AATGACAATT CAGATGATGA CAATGTGAGA AGTTTTTTGG AGACCTTGAT 1452 0 
ACCTCAAATT ACTGATGCAA ACGCTAAAAA TAAGGTTATA AAAfTAAGTA AGTTCGGTAA 14 58 0 
TTCTTGTGGA TGTAGTGCCA GTGCGAACGA ACAAAACAAA AATGGTGAAT ACAAGGACGC 14640 
25 TATAGATTGT ATGCTTAAAA AGCTTAAAGA TAAAATTGGC GAGTGCGAAA AGAAACACCA 14 7 00 
TCAAACTAGT GATACCGAGT GTTCCGACAC ACCACAACCG CAAACCCTTG AAGACGAAAC 14 76 0 
TTTGGATGAT GATATAGAAA CAGAGGAGGC GAAGAAGAAC ATGATGCCGA AAATTTGTGA 14 82 0 
AAATGTGTTA AAAACAGCAC AACAAGAGGA TGAAGGCGGT TGTGTCCCAG CAGAAAATAG 14 8 80 
TGAAGAACCG GCAGCAACAG ATAGTGGTAA GGAAACCCCC GAACAAACCC CCGTTCTCAA 14 94 0 
ACCCGAAGAA GAAGCAGTAC CGGAACCACC ACCTCCACCC CCACAGGAAA AAGCCCCGGC 15000 
ACCAATACCC CAACCACAAC CACCAACCCC CCCCACACAA CTCTTGGATA ATCCCCACGT 15060 
TCTAACCGCC CTGGTGACCT CCACCCTCGC CTGGAGCGTT GGCATCGGTT TTGCTACATT 1512 0 
CACTTATTTT TATCTAAAGG TAAATGGAAG TATATATATG GGGATGTGGA TGTATGTGGA 1518 0 
TGTATGTGAA TGTATGTGGA TGTATGTGGA TGTATGTGGA TGTGTTTTAT GGATATGTAT 1524 0 
35 TTGTGATTAT GTTTGGATAT ATATATATAT ATATATATGT TTATGTATAT GTGTTTTTGG 15300 
ATATATATAT GTGTATGTAT ATGATTTTCT GTATATGTAT TTGTGGGTTA AGGATATATA 153 6 0 
TATATGGATG TACTTGTATG TGTTTTATAT ATATATTTTA TATATATGTA TTTATATTAA 15420 
AAAAGAAATA TAAAAACAAA TTTATTAAAA TGAAAAAAAG AAAAATGAAA TATAAAAAAA 154 8 0 
AATTTATTAA AATAAAAAAA AAAAAAAAAA AAAAGGAGAA AAATTTTTTA AAAAATAATA 15 540 
40 AAAATTATAA TAAAATATAA ATTTTGATAG AATAAAAAAT GAAAAAGATT ATCAAAAAAA 156 0 0 
AATTAAAAAA AAATTTTATA TAAAAAAAAA ATGATTATAA AAAAAATAAA AACAAAAGAA 156 6 0 
GAAAAAAAAA AACATTAAAA AAAAAAAAAT ATATATCATA AAAACAAAAA AAAAAGAAAA 15 72 0 
AAATATATTA AAATAAAAAT ATATATCATA AAATAAAAAA AAATTAAAAA AATGTTT^AAA 1578 0 
AAAAAATATA TACATAAAAT AAAAAAAATT TATTTAAATA AAAAAAAAT A ATAAATAAAA 1584 0 
45 AAATTTAATT AAATAAAAAA AAATAATAAA TAAAAAAATT TAATTAAATA AAAAAAAATT 15 90 0 
AAAAAAATTT AATGAAATAA AAAAAAATAA AAAAATTTAA TTAAATAAAA AAAATAAAAT 15 96 0 
AAAATTAATT ACATGCACAT ATACATACAT ATATATATAT ATATACCCAT AACTACATAC 16 020 
AACATTTACA CATACATATA TATATATATA TATACCCATA ACTACATACA CATTTACACA 16 080 
TACATATATA TATTATATAT ATATATATAT ATACCCATAA CTACATACAT ATATACATTA 1614 0 
50 ACAAACACAT ATATAATACC TAAATACATA TATACATACA CATATATGTT CATTTTTTTT 16200 
TTTAGAAAAA AACCAAATCA TCTGTTGGAA ATTTATTCCA AATACTGCAA ATACCCAAAA 162 6 0 
GTGATTATGA TATACCGACA AAACTTTCAC CCAATAGATA TATACCTTAT ACTAGTGGTA 163 2 0 
AATACAGAGG CAAACGGTAC ATTTACCTTG AAGGAGATAG TGGAACAGAT AGTGGTTACA 163 8 0 
CCGATCATTA TAGTGATATA ACTTCCTCAG AAAGTGAATA TGAAGAGATG GATATAAATG 1644 0 
55 ATATATATGT ACCAGGTAGT CCTAAATATA AAACATTAAT TGAAGTGGTA CTTGAACCTA 16 500 
GTGGTAACAA CACAACAGCT AGTGGTAACA ACACAACAGC TAGTGGTAAC AACACAACAG 16560 
CTAGTGGTAA AAACACACCT AGTGATACAC AAAATGATAT ACAAAATGAT GGTATACCTA 16620 
GTAGTAAAAT TACAGATAAT GAATGGAATC AATTGAAAGA TGAATTTATA TCACAATATC 16680 
TACAAAGTGA ACCAAATACA GAACCAAATA TGTTAGGTTA TAATGTGGAT AATAATACCC 16740 
60 ATCCTACCAC GTCACATCAT AATGTGGAAG AAAAACCTTT TATTATGTCC ATTCATGATA 16 800 
GAAATTTATT TAGTGGAGAA GAATACAATT ATGATATGTT TAATAGTGGG AATAATCCAA 16 860 
TAAACATTAG TGATTCAACA AATAGTATGG ATAGTCTAAC AAGTAACAAC CATAGTCCAT 16 920 
ATAATGATAA AAATGATTTA TATAGTGGTA TCGACCTAAT CAACGACGCA CTAAGTGGTA 16 98 0 
ATCATATTGA TATATATGAT GAAATGCTCA AACGAAAAGA AAATGAATTA TTTGGAACAA 17 040 
65 AACATCATAC AAAACATACA AATACATATA ATGTCGCCAA ACCTGCACGT GACGACCCTA 1710 0 
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TAACCAATCA AATAAATTTG TTCCATAAAT GGTTAGATAG GCATAGAGAT ATGTGCGAAA 1716 0 
AGTGGAAAAA TAATCACGAA CGGTTACCCA AATTGAAAGA ATTGTGGGAA AATGAGACAC 17220 
ATAGTGGTGA CATAAATAGT GGTATACCTA GTGGTAACCA TGTGTTGAAT ACTGATGTTT 17280 
CTATTCAAAT AGATATGGAT AATCCTAAAA CAAAGAATGA AATTACGAAT ATGGATACAA i7'3 4 0 
ACCCAGACAA ATCTACTATG GATACTATAC TGGATGATCT GGAAAAATAT AATGAACCCT 17400 
ACTACTATGA TTTTTATGAA GATGATATCA TCTATCATGA TGTAGATGTT GAAAAATCAT 17460 
CTATGGATGA TATATATGTG GATCATAATA ATGTGACTAA TAATAATATG GATGTACCTA 1752 0 
CTAAAATGCA CATCGAAATG AATATTGTTA ATAATAAAAA GGAGATTTTC GAAGAGGAAT 175 8 0 
ATCCTATATC AGATATATGG AATATCTAAA ATTAATATAC TTTTTTTGTG TGTGTCATAT 1764 0 
ATATTTTGTA TTATTTGTAT ATGTTTTTAT TTTATTTATT TATTTATTTA TTTATTGTTT 17700 
TTGGTATATT TGTAAAAAAT ATGTTTTTGT TTATAATCAT ATTATTATAT TTTTAATAAT 17760 
TTGCAACATG ATTTTTTTTT TTCTTTCTTA TTGTGTAATT TTTTTCATAA TATTTATATA 1782 0 
TATATATGTA TTTTATTTTT TAGTATAATA ATTGTATCTA TATTTGATTA ATAATTATGT 178 8 0 
ATATTATGGT TATTTTGTTT CTTTTTCTGT ACATTTTTTC GTAATATATA TATATATATA 1794 0 
TATATATAAT TCTCTTTTTC TAATATATAT ATCCTTCTAT TTTCGATTTT TTCATTTTTT 18000 
TCCAGTATTA ATTTATTTAT TTATTTGTGA TATTTTATAA TATATTATTT AAATGTGTAT 18060 
TTATATATGT GTTTTATATA TGTGTTTTAT TTTTGTTACT CTAATTCTGA ATAATCCGAG 1812 0 
CGAAAAAAAA ATATATAATC TCATATAAAA ATTATTTATA ATACAATATT ATATAGTTTC 1818 0 
CTATTAAAAT AAATTAATAT AATATACAAT AATATTTCTT GTTATTTTTA TAAATATAAC 18240 
TAATTTCTTA TTTTTATTTA ACTTTATTCC TTTTTAATTT CTTAATTCTT TTATCAAACA 183 00 
AAAAACATAA AGTAATTCTA CATATCAACA AAAAAAAAAA AAAAAAAAAA AAAAAAAATT 18360 
TATTATAATA TAATAAAAAA TATAAAGACA TACGTTCACT TATTATTATA AATGATTTAT 18420 
TACGATTAAA ACATATTGAG ATTATAATAA TATAATTTAA CATAGAAAGA GTTAAGAATA 18480 
CATTTTTTTT TTTATTTCGA TATGTAATTC AACATATATA TATATATATA TCTTTTTAAT 1854 0 
25 TTAATTAAAT AAAATTCCTT ATTATTCATA TTGTTTCTTT TATCACATGT GAAATATTAA 1860 0 
AAATAATTTT CGATTTTATC GATATATTTA TGTCGTTTAT ATACTTATAT AGGTCTTTAT 1866 0 
AACTATTGAT TAATAGAAGG TAATAGCCTA ATAATATAAA TACTCGTATT TATAAATTCA 1872 0 
TTTATATATT TCAAATATAT TTGCATGGTT TATTTTCAAA TACAATTAAT TAGATTTCTT 18780 
AAATATTTCT TCATTTATTC ATTTTTATAG CATATACATG CACATTATAA ATTATTAATA 1884 0 
AAAAATTTTT ATTTTAATAT ATAATAACAA TTTTCATACA TTACATTTTT CACACAACAT 18 90 0 
TTAAGTTGTC ATAATGTAAC ACATTAAATA ATATATTACT TATATATATA TAATTATTAA 18 960 
TTATATATTA AATAAAAATG TATTATCGCC TGTATTATCA TAGTATATAT AATGTTGTAT 1902 0 
AACGCTTCAA AATATATATA ATAATATAAT TAAAAATATA TATATAGTAA TTAATTATTT 19080 
TGTTATGTTA TGTAATAATG CAATTAATAT AAGATAAAAT TCAT 19124 
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(2) INFORMATION FOR SEQ ID NO: 14 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 06 0 amino acids 
40 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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Thr Asn Thr His Asn Leu Leu Val Asp Val Cys Met Ala Ala Lys Phe 

150 155 ^ 

Glu Gly Gin Ser He Thr Gin Asp Tyr Pro Lys Tyr Gin Ala Thr Tyr 
c 170 175 • - 

t) Gly Asp Ser Pro Ser Gin He Cys Thr Met Leu Ala Arg Ser Phe Ala 

180 185 190 

Asp He Gly Asp He Val Arg Gly Arg Asp Leu Tyr Leu Gly Asn Pro 
195 200 205 

lie Lys Gin Arg Gin Gin Leu Glu Asn Asn Leu Lys Thr He 
210 215 220 

Phe Gly Lys He Tyr Glu Lys Leu Asn Gly Ala Glu Ala Arcj Tvr Glv' 
225 230 235 240 

Asn Asp Pro Glu Phe Phe Lys Leu Arg Glu Asp Trp Trp Thr Ala Asn 

245 250 255 

Arg Glu Thr Val Trp Lys Ala He Thr Cys Asn Ala Trp Gly Asn Thr 

260 265 270 

Tyr Phe His Ala Thr Cys Asn Arg Gly Glu Arg Thr Lys Gly Tyr Cys 
275 280 285 

on Thr Tyr Phe Asp Tyr Val Pro Gin 

20 290 295 300 

Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu Asp Phe Cys Arq Lvs Lvs 

310 315 y ^ y 

Asn Lys Lys He Lys Asp Val Lys Arg Asn Cys Arg Gly Lys Asp Lvs 

325 330 335 

Glu Asp Lys Asp Arg Tyr Cys Ser Arg Asn Gly Tyr Asp Cys Glu Lys 

340 345 350 

Thr Lys Arg Ala He Gly Lys Leu Arg Tyr Gly Lys Gin Cys He Ser 
355 360 365 

Leu Tyr Ala Cys Asn Pro Tyr Val Asp Trp He Asn Ash Gin Lys 
3° 370 375 380 

Glu Gin Phe Asp Lys Gin Lys Lys Lys Tyr Asp Glu Glu He Lys Lvs 
385 390 395 400 

Tyr Glu Asn Gly Ala Ser Gly Gly Ser Arg Gin Lys Arg Asp Ala Gly 

405 410 415 

Gly Thr Thr Thr Thr Asn Tyr Asp Gly Tyr Glu Lys Lys Phe Tyr Asp 

420 425 430 

Glu Leu Asn Lys Ser Glu Tyr Arg Thr Val Asp Lys Phe Leu Glu Lys 
435 440 445 

Ser Asn Glu Glu He Cys Thr Lys Val Lys Asp Glu Glu Gly Glv 
^0 450 455 460 

Thr He Asp Phe Lys Asn Val Asn Ser Asp Ser Thr Ser Gly Ala Ser 
^^5 470 475 480 

Gly Thr Asn Val Glu Ser Gin Gly Thr Phe Tyr Arg Ser Lys Tyr Cvs 

485 490 495 

Gin Pro Cys Pro Tyr Cys Gly Val Lys Lys Val Asn Asn Gly Gly Ser 

500 505 510 

Ser Asn Glu Trp Glu Glu Lys Asn Asn Gly Lys Cys Lys Ser Gly Lys 
515 . 520 525 
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Leu Tyr Glu Pro Lys Pro Asp Lys Glu Gly Thr Thr He Thr He Leu 
50 530 535 540 

Lys Ser Gly Lys Gly His Asp Asp He Glu Glu Lys Leu Asn Lys Phe 

550 555 560 

Cys Asp Glu Lys Asn Gly Asp Thr He Asn Ser Gly Gly Ser Gly Thr 

565 570 575 

Gly Gly Ser Gly Gly Gly Asn Ser Gly Arg Gin Glu Leu Tyr Glu Glu 

580 585 590 

Trp Lys Cys Tyr Lys Gly Glu Asp Val Val Lys Val Gly His Asp Glu 
595 600 605 

„ Asp Asp Glu Glu Asp Tyr Glu Asn Val Lys Asn Ala Gly Gly Leu Cys 

60 610 615 620 

He Leu Lys Asn Gin Lys Lys Asn Lys Glu Glu Gly Gly Asn Thr Ser 
625 630 635 640 

Glu Lys Glu Pro Asp Glu He Gin Lys Thr Phe Asn Pro Phe Phe Tyr 
« ^ . ^50 655 

6^ Tyr Trp Val Ala His Met Leu Lys Asp Ser He His Trp Lys Lys Lys 
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660 665 670 

Leu Gin Arg Cys Leu Gin Asn Gly Asn Arg lie Lys Cys Gly Asn Asn 

675 680 685 

Lys Cys Asn Asn Asp Cys Glu Cys Phe Lys Arg Trp He Thr Gln-Lvs 

690 695 700 

Lys Asp Glu Trp Gly Lys He Val Gin His Phe Lys Thr Gin Asn He 
'^^^ 710 715 720 

Lys Gly Arg Gly Gly Ser Asp Asn Thr Ala Glu Leu He Pro Phe Asp 

725 730 735 

His Asp Tyr Val Leu Gin Tyr Asn Leu Gin Glu Glu Phe Leu Lvs Glv 

740 745 750 

Asp Ser Glu Asp Ala Ser Glu Glu Lys Ser Glu Asn Ser Leu Asp Pla 

755 . 760 765 

Glu Glu Ala Glu Glu Leu Lys His Leu Arg Glu He He Glu Ser Glu 
15 770 775 780 

Asp Asn Asn Gin Glu Ala Ser Val Gly Gly Gly Val Thr Glu Gin Lvs 
785 ^^'790 795 800 

df^ lie Met Asp eys Leu Leu Asn Tyr Glu Lys Asp Glu Ala Asp Leu 

on 815 
20 Cys Leu Glu He His Glu Asp Glu Glu Glu Glu Lys Glu Lys Gly Asp 

820 825 830 

Gly Asn Glu Cys He Glu Glu Gly Glu Asn Phe Arg Tyr Asn Pro Cys 
835 840 845 

Arg Tyr Pro Val Leu Ala Asn Lys Val 
25 850 855 860 

Ala Tyr Gin Met His His Lys Ala Lys Thr Gin Leu Ala Ser Ara Ala 
865 870 875 880 

Gly Arg Ser Ala Leu Arg Gly Asp He Ser Leu Ala Gin Phe Lys Asn 

885 890 895 

Gly Arg Asn Gly Ser Thr Leu Lys Gly Gin He Cys Lys He Asn Glu 

900 905 . 910 

Asn Tyr Ser Asn Asp Ser Arg Gly Asn Ser Gly Gly Pro Cys Thr Glv 

915 920 925 

Lys Asp Gly Asp His Gly Gly Val Arg Met Arg He Gly Thr Glu Trp 
35 930 935 940 

Ser Asn He Glu Gly Lys Lys Gin Thr Ser Tyr Lys Asn Val Phe Leu 
945 950 - 955 - - 960 

Pro Pro Arg Arg Glu His Met Cys Thr Ser Asn Leu Glu Asn Leu Asp 

965 970 975 

Val Gly Ser Val Thr Lys Asn Asp Lys Ala Ser His Ser Leu Leu Gly 

980 985 990 

Asp Val Gin Leu Ala Ala Lys Thr Asp Ala Ala Glu He He Lys Arc? 

995 1000 1005 

Tyr Lys Asp Gin Asn Asn He Gin Leu Thr Asp Pro He Gin Gin Lys 
45 1010 1015 1020 

Asp Gin Glu Ala Met Cys Arg Ala Val Arg Tyr Ser Phe Ala Asp Leu 
1025 1030 1035 1040 

Gly Asp He He Arg Gly Arg. Asp Met Trp Asp Glu Asp Lys Ser Ser 

1045 1050 1055 

Thr Asp Met Glu Thr Arg Leu He Thr Val Phe Lys Asn He Lys Glu 

1060 1065 ' 1070 

Lys His Asp Gly He Lys Asp Asn Pro Lys Tyr Thr Gly Asp Glu Ser 

1075 1080 1085 

Lys Lys Pro Ala Tyr Lys Lys Leu Arg Ala Asp Trp Trp Glu Ala Asn 

1090 1095 1100 

Arg His Gin Val Trp Arg Ala Met Lys Cys Ala Thr Lys Gly He He 
1105 1110 1115 1120 

Cys Pro Gly Met Pro Val Asp Asp Tyr He Pro Gin Arg Leu Arg Trp 

1125 1130 1135 

Met Thr Glu Trp Ala Glu Trp Tyr Cys Lys Ala Gin Ser Gin Glu Tyr 

1140 1145 1150 

Asp Lys Leu Lys Lys He Cys Ala Asp Cys Met Ser Lys Gly Asp Gly 

1155 1160 1165 

Lys Cys Thr Gin Gly Asp Val Asp Cys Gly Lys Cys Lys Ala Ala Cys 
65 1170 1175 1180 
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Asp Lys Tyr Lys Glu Glu He Glu Lys Trp Asn Glu Gin Trp Arcr Lys 
1185 1190 1195 1200 

He Ser Asp Lys Tyr Asn Leu Leu Tyr Leu Gin Ala Lys Thr Thr Ser 
1205 1210 1215 - 

b Thr Asn Pro Gly Arg Thr Val Leu Gly Asp Asp Asp Pro Asp Tyr Gin 

1220 1225 1230 

Gin Met Val Asp Phe Leu Thr Pro He His Lys Ala Ser He Ala Ala 

1235 1240 1245 

Arg Val Leu Val Lys Arg Ala Ala Gly Ser Pro Thr Glu He Ala Ala 
<0 1250 1255 1260 

Ala Ala Pro He Thr Pro Tyr Ser Thr Ala Ala Gly Tyr He His Gin 
1265 1270 1275 1280 

Glu He Gly Tyr Gly Gly Cys Gin Glu Gin Thr Gin Phe Cys Glu Lys 

1285 1290 1295 

Lys His Gly Ala Thr Ser Thr Ser Thr Thr Lys Glu Asn Lys Glu Tyr 

1300 1305 1310 

Thr Phe Lys Gin Pro Pro Pro Glu Tyr Ala Thr Ala Cys Asp Cvs He 

1315 1320 1325 

Asn Arg Ser Gin Thr Glu Glu Pro Lys Lys Lys Glu Glu Asn Val Glu 
20 1330 1335 1340 

Ser Ala Cys Lys He Val Glu Lys He Leu Glu Gly Lys Asn Glv Ara 
1345 1350 1355 1360 

Thr Thr Val Gly Glu Cys Asn Pro Lys Glu Ser Tyr Pro Asp Trp Asp 

1365 1370 1375 

Cys Lys Asn Asn He Asp He Ser His Asp Gly Ala Cys Met Pro Pro 

1380 1385 1390 

Arg Arg Gin Lys Leu Cys Leu Tyr Tyr He Ala His Glu Ser Gin Thr 

. 1395 1400 1405 

Glu Asn He Lys Thr Asp Asp Asn Leu Lys Asp Ala Phe He Lys Thr 
30 1410 1415 1420 

Ala Ala Ala Glu Thr Phe Leu Ser Trp Gin Tyr Tyr Lys Ser Lys Asn 
1425 1430 1435 1440 

Asp Ser Glu Ala Lys He Leu Asp Arg Gly Leu He Pro Ser Gin Phe 

1445 1450 1455 

Leu Arg Ser Met Met Tyr Thr Phe Gly Asp Tyr Arg Asp He Cys Leu 

1460 1465 1470 

Asn Thr Asp He Ser Lys Lys Gin Asn Asp Val Ala Lys Ala Lys Asp 

1475 1480 1485 

Lys He Gly Lys Phe Phe Ser Lys Asp Gly Ser Lys Ser Pro Ser Gly 
40 1490 1495 1500 

Leu Ser Arg Gin Glu Trp Trp Lys Thr Asn Gly Pro Glu He Trp Lys 
1505 1510 1515 1520 

Gly Met Leu Cys Ala Leu Thr Lys Tyr Val Thr Asp Thr Asp Asn Lys 
1525 1530 1535 

4^ Arg Lys He Lys Asn Asp Tyr Ser Tyr Asp Lys Val Asn Gin Ser Gin 

1540 1545 1550 

Asn Gly Asn Pro Ser Leu Glu Glu Phe Ala Ala Lys Pro Gin Phe Leu 

1555 1560 1565 

Arg Trp Met He Glu Trp Gly Glu Glu Phe Cys Ala Glu Arg Gin Lvs 
50 1570 1575 1580 

Lys Glu Asn He He Lys Asp Ala Cys Asn Glu He Asn Ser Thr Gin 
1585 1590 1595 1600 

Gin Cys Asn Asp Ala Lys His Arg Cys Asn Gin Ala Cys Arg Ala Tyr 

1605 1610 1615 

Gin Glu Tyr Val Glu Asn Lys Lys Lys Glu Phe Ser Gly Gin Thr Asn 

1620 1625 1630 

Asn Phe Val Leu Lys Ala Asn Val Gin Pro Gin Asp Pro Glu Tyr Lys 

1635 1640 1645 

Gly Tyr Glu Tyr Lys Asp Gly Val Gin Pro He Gin Gly Asn Glu Tyr 
60 1650 1655 1660 

Leu Leu Gin Lys Cys Asp Asn Asn Lys Cys Ser Cys Met Asp Gly Asn 
1665 1670 1675 1680 

Val Leu Ser Val Ser Pro Lys Glu Lys Pro Phe Gly Lys Tyr Ala His 
1685 1690 1695 

65 Lys Tyr Pro Glu Lys Cys Asp Cys Tyr Gin Gly Lys His Val Pro Ser 
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1700 1705 1710 

He Pro Pro Pro Pro Pro Pro Val Gin Pro Gin Pro Glu Ala Pro Thr 

1715 1720 1725 

Val Thr Val Asp Val Cys Ser He Val Lys Thr Leu Phe Lys Asrr Thr 

1730 1735 1740 

Asn Asn Phe Ser Asp Ala Cys Gly Leu Lys Tyr Gly Lys Thr Ala Pro 

1750 1755 1760 

Ser Ser Trp Lys Cys He Pro Ser Asp Thr Lys Ser Gly Ala Gly Ala 

1765 1770 1775 

Thr Thr Gly Lys Ser Gly Ser Asp Ser Gly Ser He Cys He Pro Pro 

1780 1785 1790 

Arg Arg Arg Arg Leu Tyr Val Gly Lys Leu Gin Glu Trp Ala Thr Ala 
1795 1800 1805 

Gin Gly Glu Gly Ala Ala Pro Ser His Ser Arg Ala Asp Asp 
'5 1810 1815 1820 

Leu Arg Asn Ala Phe He Gin Ser Ala Ala He Glu Thr Phe Phe Leu 
1825 1830 1835 i840 

Trp Asp Arg Tyr Lys Glu Glu Lys Lys Pro Gin Gly Asp Gly Ser Gin 
on -^^^^ 1850 1855 

20 Gin Ala Leu Ser Gin Leu Thr Ser Thr Tyr Ser Asp Asp Glu Glu Asp 

I860 1865 1870 

Pro Pro Asp Lys Leu Leu Gin Asn Gly Lys He Pro Pro Asp Phe Leu 

1875 1880 1885 

Arg Leu Met Phe Tyr Thr Leu Gly Asp Tyr Arg Asp He Leu Val His 
20 1890 1895 1900 

Gly Gly Asn Thr Ser Asp Ser Gly Asn Thr Asn Gly Ser Asn Asn Asn 
1905 . 1910 1915 1920 

Asn He Val Leu Glu Ala Ser Gly Asn Lys Glu Asp Met Gin Lys He 

1925 1930 1935 ' 

Gin Glu Lys He Glu Gin .He Leu Pro Lys Asn Gly Gly Thr Pro Leu 

1940 1945 1950 

Val Pro Lys Ser Ser Ala Gin Thr Pro Asp Lys Trp Trp Asn Glu His 

1955 I960 1965 

Ala Glu Ser He Trp Lys Gly Met He Cys Ala Leu Thr Tyr Thr Glu 
30 1970 1975 1980 

Lys Asn Pro Asp Thr Ser Ala Arg Gly Asp Glu Asn Lys He Glu Lys 
1985 1990 1995 2000 

Asp Asp Glu Val Tyr Glu Lys Phe Phe Gly Ser Thr Ala Asp Lys His 
2005 2010 2015 

4U Gly Thr Ala Ser Thr Pro Thr Gly Thr Tyr Lys Thr Gin Tyr Asp Tyr 

2020 2025 2030 

Glu Lys Val Lys Leu Glu Asp Thr Ser Gly Ala Lys Thr Pro Ser Ala 

2035 2040 2045 

Ser Ser Asp Thr Pro Leu Leu Ser Asp Phe Val Leu Arg Pro Pro Tvr 
45 2050 2055 2060 

Phe Arg Tyr Leu Glu Glu Trp Gly Gin Asn Phe Cys Lys Lys Arg Lys 
2065 2070 2075 2080 

His Lys Leu Ala Gin He Lys His Glu Cys Lys Val Glu Glu Asn Gly 

2085 2090 2095 

Gly Gly Ser Arg Arg Gly Gly He Thr Arg Gin Tyr Ser Gly Asp Gly 

2100 2105 ' 2110 

Glu Ala Cys Asn Glu Met Leu Pro Lys Asn Asp Gly Thr Val Pro Asp 

2115 2120 2125 

Leu Glu Lys Pro Ser Cys Ala Lys Pro Cys Ser Ser Tyr Arg Lys Trp 
00 ziju 213S 211C 

He Glu Ser Lys Gly Lys Glu Phe Glu Lys Gin Glu Lys Ala Tyr Glu 
2145 2150 2155 2160 

Gin Gin Lys Asp Lys Cys Val Asn Gly Ser Asn Lys His Asp Asn Gly 

2165 2170 2175 

Phe Cys Glu Thr Leu Thr Thr Ser Ser Lys Ala Lys Asp Phe Leu Lys 

2180 2185 2190 

Thr Leu Gly Pro Cys Lys Pro Asn Asn Val Glu Gly Lys Thr He Phe 

2195 2200 2205 

Asp Asp Asp Lys. Thr Phe Lys His Thr Lys Asp Cys Asp Pro Cys Leu 
65 2210 2215 2220 
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Lys Phe Ser Val Asn Cys Lys Lys Asp Glu Cys Asp Asn Ser Lys Gly 
2225 2230 2235 2240 

Thr Asp Cys Arg Asn Lys Asn Ser He Asp Ala Thr Asp He Glu Asn 
2245 2250 2255 - 

b Gly Val Asp Ser Thr Val Leu Glu Met Arg Val Ser Ala Asp Ser Lys 

2260 2265 2270 

Ser Gly Phe Asn Gly Asp Gly Leu Glu Asn Ala Cys Arg Gly Ala Gly 

2275 2280 2285 

He Phe Glu Gly He Arg Lys Asp Glu Trp Lys Cys Arg Asn Val Cys 
10 2290 2295 2300 

Gly Tyr Val Val Cys Lys Pro Glu Asn Val Asn Gly Glu Ala Lys Gly 
2305 2310 2315 2320 

Lys His He He Gin He Arg Ala Leu Val Lys Arg Trp Val Glu Tyr 

2325 2330 2335 

Phe Phe Glu Asp Tyr Asn Lys He Lys His Lys He Ser His Arg He 

2340 2345 2350 

Lys Asn Gly Glu He Ser Pro Cys He Lys Asn Cys Val Glu Lys Trp 

2355 2360 2365 

Val Asp Gin Lys Arg Lys Glu Trp Lys Glu He Thr Glu Arg Phe Lys 
20 2370 2375 2380 

Asp Gin Tyr Lys Asn Asp Asn Ser Asp Asp Asp Asn Val Arg Ser Phe 
2385 2390 2395 2400 

Leu Glu Thr Leu He Pro Gin He Thr Asp Ala Asn Ala Lys Asn Lys 

2405 2410 2415 

Val He Lys Leu Ser Lys Phe Gly Asn Ser Cys Gly Cys Ser Ala Ser 

2420 2425 2430 

Ala Asn Glu Gin Asn Lys Asn Gly Glu Tyr Lys Asp Ala He Asp Cys 

2435 . 2440 2445 

Met Leu Lys Lys Leu Lys Asp Lys He Gly Glu Cys Glu Lys Lys His 
30 2450 2455 2460 

His Gin Thr Ser Asp Thr Glu Cys Ser Asp Thr Pro Gin Pro Gin Thr 
2465 2470 2475 2480 

Leu Glu Asp Glu Thr Leu Asp Asp Asp He Glu Thr Glu Glu Ala Lys 

2485 2490 2495 

Lys Asn Met Met Pro Lys He Cys Glu Asn Val Leu Lys Thr Ala Gin 

2500 2505 2510 

Gin Glu Asp Glu Gly Gly Cys Val Pro Ala Glu Asn Ser Glu Glu Pro 

2515 2520 2525 

Ala Ala Thr Asp Ser Gly Lys Glu Thr Pro Glu Gin Thr Pro Val Leu 
40 2530 2535 2540 

Lys Pro Glu Glu Glu Ala Val Pro Glu Pro Pro Pro Pro Pro Pro Gin 
2545 2550 2555 2560 

Glu Lys Ala Pro Ala Pro He Pro Gin Pro Gin Pro Pro Thr Pro Pro 
2565 2570 2575 

43 Thr Gin Leu Leu Asp Asn Pro His Val Leu Thr Ala Leu Val Thr Ser 

2580 2585 2590 

Thr Leu Ala Trp Ser Val Gly He Gly Phe Ala Thr Phe Thr Tyr Phe 

2595 2600 2605 

Tyr Leu Lys Lys Lys Thr Lys Ser Ser Val Gly Asn Leu Phe Gin He 
50 2610 2615 2620 

Leu Gin He Pro Lys Ser Asp Tyr Asp He Pro Thr Lys Leu Ser Pro 
2625 2630 2635 2640 

Asn Arg Tyr He Pro Tyr Thr Ser Gly Lys Tyr Arg Gly Lys Arg Tyr 
2645 2650 2655 

55 He Tyr Leu Glu Gly Asp Ser Gly Thr Asp Ser Gly Tyr Thr Asp His 

2660 2665 2670 

Tyr Ser Asp He Thr Ser Ser Glu Ser Glu Tyr Glu Glu Met Asp He 

2675 2680 2685 

Asn Asp He Tyr Val Pro Gly Ser Pro Lys Tyr Lys Thr Leu He Glu 
60 2690 2695 2700 

Val Val Leu Glu Pro Ser Gly Asn Asn Thr Thr Ala Ser Gly Asn Asn 
2705 2710 2715 2720 

Thr Thr Ala Ser Gly Asn Asn Thr Thr Ala Ser Gly Lys Asn Thr Pro 

2725 2730 2735 

Ser Asp Thr Gin Asn Asp He Gin Asn Asp Gly He Pro Ser Ser Lys 
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2740 2745 2750 

lie Thr Asp Asn Glu Trp Asn Gin Leu Lys Asp Glu Phe lie Ser Gin 

2755 2760 2765 

Tyr Leu Gin Ser Glu Pro Asn Thr Glu Pro Asn Met Leu Gly Tyr 'Asn 
5 2770 2775 2780 

Val Asp Asn Asn Thr His Pro Thr Thr Ser His His Asn Val Glu Glu 
2785 2790 2795 2800 

Lys Pro Phe lie Met Ser lie His Asp Arg Asn Leu Phe Ser Gly Glu 
2805 2810 2815 

'0 Glu Tyr Asn Tyr Asp Met Phe Asn Ser Gly Asn Asn Pro He Asn He 

2820 2825 2830 

Ser Asp Ser Thr Asn Ser Met Asp Ser Leu Thr Ser Asn Asn His Ser 

2835 2840 2845 

Pro Tyr Asn Asp Lys Asn Asp Leu Tyr Ser Gly He Asp Leu He Asn 
15 2850 2855 2860 

Asp Ala Leu Ser Gly Asn His He Asp He Tyr Asp Glu Met Leu Lys 
2865 2870 2875 2880 

Arg Lys Glu Asn Glu Leu Phe Gly Thr Lys His His Thr Lys His Thr 

2885 2890 2895 

Asn Thr Tyr Asn Val Ala Lys Pro Ala Arg Asp Asp Pro He Thr Asn 

2900 2905 2910 

Gin He Asn Leu Phe His Lys Trp Leu Asp Arg His Arg Asp Met Cys 

2915 2920 2925 

Glu Lys Trp Lys Asn Asn His Glu Arg Leu Pro Lys Leu Lys Glu Leu 
25 2930 2935 2940 

Trp Glu Asn Glu Thr His Ser Gly Asp He Asn Ser Gly He Pro Ser 
2945 2950 2955 2960 

Gly Asn His Val Leu Asn Thr Asp Val Ser He Gin He Asp Met Asp 

2965 2970 2975 

Asn Pro Lys Thr Lys Asn Glu He Thr Asn Met Asp thr Asn Pro Asp 

2980 2985 2990 

Lys Ser Thr Met Asp Thr He Leu Asp Asp Leu Glu Lys Tyr Asn Glu 

2995 3000 3005 

Pro Tyr Tyr Tyr Asp Phe Tyr Glu Asp Asp He He Tyr His Asp Val 
35 3010 3015 3020 

Asp Val Glu Lys Ser Ser Met Asp Asp He Tyr Val Asp His Asn Asn 
3025 3030 3035 3040 

Val Thr Asn Asn Asn Met Asp Val Pro Thr Lys Met His He Glu Met 
3045 3050 3055 

40 Asn He Val Asn 

3060 

(2) INFORMATION FOR SEQ ID NO: 15: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7295 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



30 



50 



(ii) MOLECULE TYPE: cDNA 
( iii ) HYPOTHETICAL : NO 
(iv) ANTI -SENSE: NO 



00 (xi) ai;;QutirvrL:ii ucSurtxPi xON : SEQ ID NO: 13. 

TCCAAGCTGT tTTTTTTTCT TTTTCTAGTT TTTCCATTGT ATATTCGTCA AATACGTACA 6 0 

CATATATATA TATATGTATA ACATGTGAGT ATTATTTTAT ACATCACATC GATTACATTT 12 0 

TAGCGTTTTT TTTCCCCAGA TCACATATAG TACGACTAAG AAACAAAATA ACATCATAAC 18 0 

60 AAACATAGTG ATTATGAATA CATGATATTA CCACATAATA TAAAGTATTA AATAATATTA 24 0 

TTGCATGTTA GTGATAACTA CTATATCATA TACACCACTA CTAACTATCA CTACATAGTA 3 00 

ACAGTAGTAG TCACAATCAT AGCATCATGG TAATATAGAT TTTCATTTCA TATCTTCCTT 360 

ATTGTTTGTT ATACATACAC TATTAATATG TATTTATGTT ATAATGGTAG ACTATGTTAA 42 0 

CAATGTATGA ATGACCATCA TAAATTAATA ACAGACGCAT CAAAACAGTG TATATGTGTG 480 

65 CATTTATGAC ATAATGTAGT CGGGAAGCAT ACAAAAATGG AGCCAGGAGG TAGCGGTGGT 54 0 
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CGTGGTAGTG GCGGTAGTAG TAGTGGTAAA GGGAAGAAGG ATACATCTGA GTATATTTAT 600 
GTGAGCGATG CTAAGGATCT TTTGGATAGA GTTGGAGAAA AAGTGTACGA AGAAAAAGTG 66 0 
AAAAATGGTG ATGCTAAAAA ATATATTGAG GCGTTGAAAG GAAATTTGAA CACAGCAAAT 72 0 
GGTCGTAGTT CGGAAACAGC TAGCAGTATT GAAACGTGCA CCCTTGTAAA AGAATATTAT -78 0 
GAGCGTGTTA ATGGTGATGG TAAAAGGCAT CCGTGCAGAA AAGACGCAAA AAATGAAGAT 84 0 
GTAAACCGTT TTTCGGATAC ACTTGGTGGC CAATGTACAT ACAATAGGAT AAAAGATAGT 900 
CAACAGGGTG ATAATAAAGT AGGAGCCTGT GCTCCGTATA GACGATTACA TTTATGTGAT 96 0 
TATAATTTGG AATCTATAGA CACAACGTCG ACGACGCATA AGTTGTTGTT AGAGGTGTGT 102 0 
ATGGCAGCAA AATACGAAGG AAACTCAATA AATACACATT ATACACAACA TCAACGAACT 108 0 
AATGAGGATT CTGCTTCCCA ATTATGTACT GTATTAGCAC GAAGTTTTGC AGATATAGGT 114 0 
GATATCGTAA GAGGAAAAGA TCTATATCTC GGTTATGATA ATAAAGAAAA AGAACAAAGA 12 00 
AAAAAATTAG AACAGAAATT GAAAGATATT TTCAAGAAAA TACATAAGGA CGTGATGAAG 126 0 
ACGAATGGCG CACAAGAACG CTACATAGAT GATGCCAAAG GAGGAGATTT TTTTCAATTA 132 0 
AGAGAAGATT GGTGGACGTC GAATCGAGAA ACAGTATGGA AAGCATTAAT ATGTCATGCA 13 8 0 
15 CCAAAAGAAG CTAATTATTT TATAAAAACA GCGTGTAATG TAGGAAAAGG AACTAATGGT 14 4 0 
CAATGCCATT GCATTGGTGG AGATGTTCCC ACATATTTCG ATTATGTGCC GCAGTATCTT 15 00 
CGCTGGTTCG AGGAATGGGC AGAAGACTTT TGCAGGAAAA AAAAAAAAAA ACTAGAAAAT 156 0 
TTGCAAAAAC AGTGTCGTGA TTACGAACAA AATTTATATT GTAGTGGTAA TGGCTACGAT 162 0 
TGCACAAAAA CTATATATAA AAAAGGTAAA CTTGTTATAG GTGAACATTG TACAAACTGT 16 80 
20 TCTGTTTGGT GTCGTATGTA TGAAACTTGG ATAGATAACC AGAAAAAAGA ATTTCTAAAA 174 0 
CAAAAAAGAA AATACGAAAC AGAAATATCA GGTGGTGGTA GTGGTAAGAG TCCTAAAAGG 18 0 0 
ACAAAACGGG CTGCACGTAG TAGTAGTAGT AGTGATGATA ATGGGTATGA AAGTAAATTT 186 0 
TATAAAT^C TGAAAGAAGT TGGCTACCAA GAtGTCGATA AATTTTTAAA AATATTAAAC 1920 
AAAGAAGGAA TATGTCAAAA ACAACCTCAA GTAGGAAATG AAAAAGCAGA TAATGTTGAT 1980 
25 TTTACTAATG AAAAATATGT AAAAACATTT TCTCGTACAG AAATTTGTGA ACCGTGCCCA 2 04 0 
TGGTGTGGAT TGGAAAAAGG TGGTCCACCA TGGAAAGTTA AAGGTGACAA AACCTGCGGA 210 0 
AGTGCAAAAA CAAAGACATA CGATCCTAAA AATATTACCG ATATACCAGT ACTCTACCCT 2160 
GATAAATCAC.AGCAAAATAT ACTAAAAAAA TATAAAAATT TTTGTGAAAA AGGTGCACCT 222 0 
GGTGGTGGTC AAATTAAAAA ATGGCAATGT TATTATGATG AACATAGGCC TAGTAGTAAA 22 8 0 
30 AATAATAATA ATTGTGTAGA AGGAACATGG GACAAGTTTA CACAAGGTAA ACAAACCGTT 234 0 
AAGTCCTATA ATGTTTTTTT TTGGGATTGG GTTCATGATA TGTTACACGA TTCTGTAGAG 2400 
TGGAAGACAG AACTTAGTAA GTGTATAAAT AATAACACTA ATGGCAAGAC ATGTAGAAAC 24 60 
AATAATAAAT GTAAAACAGA TTGTGGTTGT. TTTCAAAAAT GGGTTGAAAA AAAACAACAA 252 0 
GAATGGAfGG CAATAAAAGA CCATTTTGGA AAGCAAACAG ATATTGTCCA ACAAAAAGGT 258 0 
35 CTTATCGTAT TTAGTCCCTA TGGAGTTCTT GACCTTGTTT TGAAGGGCGG TAATCTGTTG 264 0 
CAAAATATTA AAGATGTTCA TGGAGATACA GATGACATAA AACACATTAA GAAACTGTTG 2 700 
GATGAGGAAG ACGCAGTAGC AGTTGTTGTT GGTGGCAAGG ACAATACCAC AATTGATAAA 2 76 0 
TTACTACAAC ACGAAAAAGA ACAAGCAGAA CAATGCAAAC AAAAGCAGGA AGAATGCGAG 2 82 0 
AAAAAAGCAC AACAAGAAAG TCGTGGTCGC TCCGCCGAAA CCCGCGAAGA CGAAAGGACA 2880 
40 CAACAACCTG CTGATAGTGC CGGCGAAGTC GAAGAAGAAG AAGACGACGA CGACTACGAC 2 94 0 
GAAGACGACG AAGATGACGA CGTAGTCCAG GAGGAGGAAG AGGGAAAGGA GGAAGGAACG 3 00 0 
GTCACAGAGG TAACAGAGGT AACAGAGGTC GTGGAAGAGA CGGTAACAGA ACAGGAAGGG 3 06 0 
GTGAAGCCAT GTGACATAGT GGGCAAACTA TTTGAGGACG ACAAAAGTCT CAAAGAGGCA 312 0 
TGTGGTCTAA AATACGGTCC AGGTGGAAAA GAAAAATTCC CCAATTGGAA GTGTGTCACA 318 0 
45 CCAAGTGGTG TCAGTACTGC CACTAGTGGA AAAGACGGCG CTATATGTGT GCCACCCAGG 32 4 0 
AGACGACGAT TATACGTAGG TGGTTTATCA CAATGGGCAA GTCGTGGTGG TGACGAGACC 33 00 
ACGGAGGTGT CGAGTGAAGC CACTTCGGCG CCGTCACAGT CAGAAAGTGA AAAACTACGT 33 6 0 
ACTGCGTTTA TTGAGTCCGC TGCAATAGAG ACGTTTTTTT TGTGGCATAA GTATAAAGAA 342 0 
GAGAAAAAAC CACCAGCAAC ACAAGATGGA GCGGGACTTG GAGTATCACT CCCAGAACCG 34 8 0 
50 TCACCACCGG GAGAGGACCC CCAAACACAA TTACAACAAA CTGGTGTTAT ACCCCCCGAT 3 54 0 
TTTTTGCGTC AAATGTTTTA TACATTAGCA GACTACAAAG ACATATTATA CAGTGGTAGT 36 00 
AACGACACAA GTGACACAAC TGGTAAACAG ACACCTAGTA GTAGTAATGA CAACCTCAAA 3 66 0 
AATATTGTTC TGGAAGCAAG TGGTAGTACT GAGCAGGAGA AGGAGAAAAT GAAACAAATA 3 72 0 
CAAGCGAAAA TAAAAAAAAT TTTAAACGGT GCCACATCTG GTGTCCCACC TGTCACCAAA 378 0 
55 AATAGTGTCA AAACCCCCCA ACAAACCTGG TGGGAAAACA TCGCGAAGGA TATCTGGAAT 3 84 0 
GCTATGGTAT GTGCACTAAC ATATAAAGAA AATGACGCCA GAGGCACAAG TGCCAAAATA 3 900 
GAACAGAATA AGGATTTGAA AAAGGCACTT TGGGACGAAG CCAACAAAAA CACCCCCATA 3 96 0 
GAGAAATACC AATACACAAA TGTCAAACTC GAAGATGAAA GTGGTGCCAA AAGCAACGAC 4 02 0 
ACCATCCAAC CCCCCACGTT AAAAAATTTT GTGGAAATAC CTACATTTTT TCGTTGGTTA 408 0 
60 CATGAGTGGG GAAACAGTTT TTGTTTTGAG AGAGCAAAAC GATTGGCACA AATAAAACAT 414 0 
GAGTGTATGG ATGAGGATGG TGAAAAACAA TATAGTGGGG ATGGGGAATA TTGTGAAGAA 42 00 
ATTTTTAGTA AGCAATATAA TGTTCTCCAG GATTTAAGTT CCAGTTGCGC TAAACCTTGT 426 0 
AGATTGTATA AAACGTGGAT AGAAAAAAAA AAAACAGAAT ATGAGAAACA ACAAAAGGCA 43 2 0 
TATGAACAAC AAAAAAGTAA TTACGAAAAT GAACAAAAAG ACAAATGCCA AACACAAAGT 43 8 0 
65 AATAATAATG CTAATGAATT TTCTAGAACA CTAGGAGCGT CCCCTACAGC TGCAGAATTT 444 0 
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TTACAAAAGT TAGGATCATG TAAAAATGAT AATGGATATG AGAATGGAGA GGATAATAAA 45 00 
ATAGATTTTA AAAATCCAGA TAAAACATTT AAGGAAGCAC ACAGTTGTGA TCCATGTCCT 4 56 0 
ATAACTGGAG TTAAATGTCA AAATGGTCAT TGTGTGGGTT CTGCTAATGG AAAGGAGTGC 462 0 
AAAAACAATA AGATTACTGC AGAAGATATT AAAAATAAGA CAGATCCTAA TGGAAACATPr 46 8 0 
GAAATGGTTG TCAGTGATGA CAGTACAAAT ACATTTGAAC ATTTAGGCGA TTGTAAAAGC 4 74 0 
TCAGGTATCT TTAAAGGTAT CAGAAAAGAT GAATGGAAAT GCGCTAATGT ATGTGGTGTA 48 00 
GATATATGTA CTCTGGAAAA AAAAATTAAG AATGGGCAAG AAGGTGATAA AAAATATATC 486 0 
ACAATGAAAG AATTGCTTAA ACGATGGCTA GAATATTTTT TAGAAGATTA TAATAGAATT 4 92 0 
AGAAAAAAAA TAAAGCTATG TACGAAAAAG GAAGATGGAT GCAAATGTAT AAAAGGTTGT 4 98 0 
ATAGAAAAAT GGGTACAAGA AAAAACGAAA GAATGGCAAA AAATAAACGA TACTTATCTT 504 0 
GAACAATATA AAAATGATGA TGGTAATACT TTAACTAATT TTTTGGAGCA ATTCCAATAT 5100 
CGAACTGAAT TTAAAAACGC TATAAAACCt TGTGATGGTT TAGACCAGTT CAAGACTTCG 516 0 
TGTGGTCTTA ATAGTACTGA TAATTCACAA AATGGTAATA ATAACGATCT TGTTCTATGT 522 0 
TTGCTTAATA AACTTCAAAA AAAAATTAGT GAGTGTAAAG AACAACATAG TGGCCAAACC 52 80 
CAAACACCGT GTGATAACTC TTCCCTTAGT GGTAAAGAAT CCACCCTCGT TGAAGACGTT 53 4 0 
GATGATTATG AGGAACAAAA CCCAGAAAAC AAAGTGGAAC AACCTAAATT TTGTCCAGAT 54 00 
ATGAAAGAAC CAAAAAAAGA AAACGATGAA GAAGTAGGCA CTTGTGGCGG AGACGAAGAA 54 6 0 
AAAAAAAAAG TGGAAGACAG TGTAATCGAA CAAAAAGAGG AAGAAGCAGC TAGTGCCCCA 552 0 
GAGGAATCTC CTCCATTAAC CCCGGAAGCA CCAAAAAAAG AGGAAAATGT GGTACCAAAA 558 0 
CCACCACCAC CACCAT^AAAA ACGCCGAATC AAAACCCGTA ATGTGTTGGA CCACCCCGCT 564 0 
GTCATACCCG CCCTCATGTC TTCTACCATC ATGTGGAGTA TTGGCATCGG TTTTGCTGCG 57 00 
TTCACTTATT TTTATCTAAA GAAAAAAACC AAATCATCTG TTGGAAATTT ATTCCAAATA 5760 
CTGCAAATAC CCAAAAGTGA TTATGATATA CCTACATTGA AATCAAGCAA TCGTTATATA 5 82 0 
CCCTATGCAA GTGATAGACA TAAAGGCAAA ACATATATTT ATATGGAAGG AGATAGCAGT 58 8 0 
25 GGAGATGAAA AATATGCATT TATGTCTGAT ACTACTGATA TAACTTCATC CGAAAGTGAG 5 94 0 
TATGAAGAAT TGGATATTAA TGATATATAT GTACCAGGTA GTCCTAAATA TAAAACATTG 6 000 
ATAGAAGTAG TACTTGAACC ATCAAAAAGA GATACACAAA ATGATATACA CAATGATATA 6 06 0 
CCTAGTGATA TACCAAATAG TGACACACCA CCACCCATTA CTGATGATGA ATGGAATCAA 612 0 
TTGAAAAAAG ATTTTATATC TAATATGTTA CAAAATACAC AAAATACGGA ACCAAATATT 618 0 
TTACATGATA ATGTGGATAA TAATACCCAT CCTACCATGT CACGTCATAA TATGGACCAA 62 4 0 
AAACCTTTTA TTATGTCCAT ACATGATAGA AATTTATTTA GTGGAGAAGA ATACAATTAT 63 0 0 
GATATGTTTA ATAGTGGGAA TAATCCAATA AACATTAGTG ATTCAACAAA TAGTATGGAT 63 6 0 
AGTCTAACAA GTAACAACCA TAGTCCATAT AATGATAAAA ATGATTTATA TAGTGGTATC 642 0 
GACCTAATCA ACGACGCACT AAGTGGTAAT CATATTGATA TATATGATGA AATGCTCAAA 64 8 0 
CGAAAAGAAA ATGAATTATT CGGGACGCAA CATCATCCAA AAAATATAAC GTCTAACCGT 654 0 
GTCGTTACCC AAACAAGTAG TGACGACCCT ATAACCAATC AAATAAATTT GTTCCATAAA 660 0 
TGGTtAGATA GGCATAGAGA TATGTGCGAA AAGTGGAAAA ATAATCACGA ACGGTTACCC 666 0 
AAATTGAAAG AATTGTGGGA AAATGAGACA CATAGTGGTG ACATAAATAG TGGTATACCT 6 72 0 
AGTGGTAACC ATGTGTTGAA TACTGATGTT TCTATTCAAA TAGATATGGA TAATCCGAAA 678 0 
ACAATGAATG AATTTACTAA TATGGATACA AACCCCGACA AATCTACTAT GGATACTATA 6 84 0 
TTGGATGATC TAGAAAAATA TAACGAACCC TACTACTATG ATTTTTATAA ACATGATATC 6 90 0 
TATTATGATG TAAATGATGA TAAAGCATCT GAGGATCATA TAAATATGGA TCATAATAAG 6 96 0 
ATGGATAATA ATAATTCGGA TGTCCCCACT AACGTACAAA TTGAAATGAA TGTCATTAAT 7 02 0 
AATCAGGAGT TACTACAAAA TGAATATCCT ATATCGCATA TGTAGGGAAT ATGAAAATAA 7 08 0 
TAGATGTATA TATGTTTTTT TCTTTTTTTG TGTGTGTGCA GTTTATATTT TTTATTTGTA 714 0 
GATGTTATAT ATTTTTTTTA TTTGTGGGTT ATATTATAAT TTTTATTTAT GGGTTATATA 72 0 0 
TATATTTTTT TTTTTGTGCA TTTGTCTATT TTTTATTTGT GCTTTATATA TATATATATT 72 6 0 
TTATTCAGCT TGGACTTAAC CAGGCTGAAC TTGCT 72 95 

50 (2) INFORr4ATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2182 amino acids 

(B) TYPE: amino acid 

55 (c; GTRATTDEDNEES : cir.gle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

60 (iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Glu Pro Gly Gly Ser Gly Gly Arg Gly Ser Gly Gly Ser Ser Ser 
c ^ ^ 10 15 • - 

t) Gly Lys Gly Lys Lys Asp Thr Ser Glu Tyr He Tyr Val Ser Asp Ala 

20 25 30 

Lys Asp Leu Leu Asp Arg Val Gly Glu Lys Val Tyr Glu Glu Lys Val 
35 40 45 

,n Asn Gly Asp Ala Lys Lys Tyr He Glu Ala Leu Lys Gly Asn Leu 

^0 50 55 60 

Asn Thr Ala Asn Gly Arg Ser Ser Glu Thr Ala Ser Ser He Glu Thr 

70 75 80 

Cys Thr Leu Val Lys Glu Tyr Tyr Glu Arg Val Asn Gly Asp Gly Lys 

85 90 95 

Arg His Pro Cys Arg Lys Asp Ala Lys Asn Glu Asp Val Asn Arg Phe 

100 105 110 

Ser Asp Thr Leu Gly Gly Gin Cys Thr Tyr Asn Arg He Lys Asp Ser 

115 120 125 

Gin Gin Gly Asp Asn Lys Val Gly Ala Cys Ala Pro Tyr Ara Ara Leu 
20 130 135 * 140 

His Leu Cys Asp Tyr Asn Leu Glu Ser He Asp Thr Thr Ser Thr Thr 

150 155 160 

His Lys Leu Leu Leu Glu Val Cys Met Ala Ala Lys Tyr Glu Gly Asn 

165 170 175 

Ser He Asn Thr His Tyr Thr Gin His Gin Arg Thr Asn Glu Asp Ser 

180 185 190 

Ala Ser Gin Leu Cys Thr Val Leu Ala Arg Ser Phe Ala Asp He Gly 

195 200 205 

Asp He Val Arg Gly Lys Asp Leu Tyr Leu Gly Tyr Asp Asn Lys Glu 
30 210 215 220 

Lys Glu Gin Arg Lys Lys Leu Glu Gin Lys Leu Lys Asp He Phe Lys 
225 230 235 240 

Lys He His Lys Asp Val Met Lys Thr Asn Gly Ala Gin Glu Arg Tyr 

245 250 255 

He Asp Asp Ala Lys Gly Gly Asp Phe Phe Gin Leu Arg Glu Asp Trp 

260 265 270 

Trp Thr Ser Asn Arg Glu Thr Val Trp Lys Ala Leu He Cys His Ala 

275 280 285 

P^o Lys Glu Ala Asn Tyr Phe He Lys Thr Ala Cys Asn Val Gly Lys 
40 290 295 300 

Gly Thr Asn Gly Gin Cys His Cys He Gly Gly Asp Val Pro Thr Tyr 
305 310 315 320 

Phe Asp Tyr Val Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu 

325 330 335 

Asp Phe Cys Arg Lys Lys Lys Lys Lys Leu Glu Asn Leu Gin Lys Gin 

340 345 350 

Cys Arg Asp Tyr Glu Gin Asn Leu Tyr Cys Ser Gly Asn Gly Tyr Asp 

355 360 365 

Cys Thr Lys Thr He Tyr Lys Lys Gly Lys Leu Val He Gly Glu His 
50 370 375 380 

Cys Thr Asn Cys Ser Val Trp Cys Arg Met T'yr Glu Thr Trp He Asp 
385 390 395 400 

Asn Gin Lys Lys Glu Phe Leu Lys Gin Lys Arg Lys Tyr Glu Thr Glu 

405 410 415 

He Ser Gly Gly Gly Ser Gly Lys Ser Pro Lys Arg Thr Lys Arg Ala 

420 425 430 

Ala. Arg Ser Ser Ser Ser Ser Asp Asp Asn Gly Tyr Glu Ser Lys Phe 

435 440 445 

Tyr Lys Lys Leu Lys Glu Val Gly Tyr Gin Asp Val Asp Lys Phe Leu 
60 450 455 460 

Lys He Leu Asn Lys Glu Gly He Cys Gin Lys Gin Pro Gin Val Gly 
465 470 475 480 

Asn Glu Lys Ala Asp Asn Val Asp Phe Thr Asn Glu Lys Tyr Val Lys 

485 490 495 

Thr Phe Ser Arg Thr Glu He Cys Glu Pro Cys Pro Trp Cys Gly Leu 
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RR 



60 



65 



Glu 






500 










505 










^ -J- \j 






Lys 


Gly Gly 


Pro 


Pro 


Trp 


Lys 


Val 


Lys 


Gly Asp 


Lvs 


Thr 


Cys 


Gly 




Ala 


o Id 










520 














Ser 


Lys 


Thr 


Lys 


Thr 


Tyr 


Asp 


Pro 


Lys 


Asn 


He 


Thr 


Asp 


He, 




Val 
545 


530 










535 










540 








Leu 


Tyr 


Pro 


Asp 


Lys 
550 


Ser 


Gin 


Gin 


Asn 


He 
555 


Leu 


Lys 


Lys 


Tvr 


Lys 
Q c n 

D D u 


Asn 


Phe 


Cys 


Glu 


Lys 


Gly 


Ala 


Pro 


Gly Gly Gly Gin 


He 


Lys 


Lys 




Gin 








565 










D / U 










Cys 


Tyr 


Tyr 


Asp 


Glu 


His 


Arg 


Pro 


Ser 


Ser 


J— 1 y o 


Asn 


Asn 


Asn 


Asn 




Val 


Glu 


580 










^ A c: 

■J O 








~j J \j 






Cys 


Gly 


Thr 


Trp 


Asp 


Lys 


Phe 


Thr 


Gin 


Gly 


Lvs 


Gin 


Thr 


Val 






595 










600 








605 








Lys 


Ser 


Tyr 


Asn 


Val 


Phe 


Phe 


Trp 


Asp 


1 rp 


V d X 


His 


Asp 


Met 


Leu 


His 




610 










615 










62 0 








Asp 


Ser 


Val 


Glu 


Trp 


Lys 


Thr 


Glu 


Leu 






uys 


He 


Asn 


Asn 


Asn 


625 










630 










O J t> 








^ A n 
D ^ u 


Thr 


Asn 


Gly Asn 


Thr 


Cys 


Arg 


Asn 


Asn 


Asn 


T 

J->_y o 


v^y 5 


Lys 


Thr 


Asp 


Cvs 










645 










650 










6 =^ 5 

o ^ ^ 


Gly 


Cys 


Phe 


Gin 


Lys 


Trp 


Val 


Glu 


Lys 


Lys 


Pin 


Gin 


Glu 


Trp 


Met 


Ala 


He 






660 










665 










7 n 






Lys 


Asp 


His 


Phe 


Gly 


Lys 


Gin 


Thr 


Asp 


He 


Val 


Gin 


Gin 


Lys 


Gly 






675 










6 80 










\j O Zj 




Leu 


He 


Val 


Phe 


Ser 


Pro 


Tyr 


Gly 


Val 


Leu 


Asp 


Leu 


Val 


Leu 


Lys 


Gly 




690 










695 










700 






Gly 


Asn 


Leu 


Leu 


Gin 


Asn 


He 


Lys 


Asp 


Val 


His 


Gly 


Asp 


Thr 


Asp 


Asp 


705 




His 






710 










715 










He 


Lys 


He 


Lys 
725 


Lys 


Leu 


Leu 


Asp 


Glu 
730 


Glu 


Asp 


Ala 


Val 


Ala 

7 "? R 

f J Zj 


Val 


Val 


Leu 


Gly Gly 


Lys 


Asp. 


Asn 


Thr 


Thr 


He 


Asp 


Lys 


Leu 


Leu 


Gin 


His 


Glu 






740 










745 








750 






Lys 


Glu 


Gin 


Ala 


Glu 


Gin 


Cys 


Lys 


Gin 


Lys 


Gin 


Glu 


Glu 


Cv*? 


Glu 






755 










760 










765 






Lys 


Lys 


Ala 


Gin 


Gin 


Glu 


Ser 


Arg 


Gly Arg 


Ser 


Ala 


Glu 


Thr 


Arg 


Glu 




770 










775 










780 








Asp 


Glu 


Arg 


Thr 


Gin 


Gin 


Pro 


Ala 


Asp 


Ser 


Ala 


Gly 


Glu 


Val 


Glu 


Glu 


7 8 5 










790 










795 








R n n 
o u u 


Glu 


Glu 


Asp 


Asp 


Asp 
805 


Asp 


Tyr 


Asp 


Glu 


Asp 
810 


Asp 


Glu 


Asp 


Asp 


ASD 

815 


Val 


Val 


Gin 


Glu 


Glu 


Glu 


Glu 


Gly 


Lys 


Glu 


Glu 


Gly Thr 


Val 


Thr 


Glu 


Val 








820 










825 










83 0 






Thr 


Glu 


Val 


Thr 


Glu 


Val 


Val 


Glu 


Glu 


Thr 


Val 


Thr 


Glu 


Gin 


Glu 


Glv 






835 










840 










845 






Val 


Lys 


Pro 


Cys 


Asp 


He 


Val 


Gly 


Lys 


Leu 


Phe 


Glu 


Asp 


Asp 


Lvs 


Ser 




850 










855 










860 




Leu 


Lys 


Glu 


Ala 


Cys 


Gly 


Leu 


Lys 


Tyr 


Gly 


Pro 


Gly 


Gly 


Lys 


Glu 


Lys 


865 










870 










875 










R R n 

O O \J 


Phe 


Pro 


Asn 


Trp 


Lys 


Cys 


Val 


Thr 


Pro 


Ser 


Gly Val 


Ser 


Thr 


Ala 


Th-r 
111. 










885 










890 










8 95 




Ser 


Gly 


Lys 


Asp 


Gly 


Ala 


He 


Cys 


Val 


Pro 


Pro 


Arg 


Arg 


Arg 


Arg 


Leu 








900 










905 










910 




Tyr 


Val 


Gly Gly 


Leu 


Ser 


Gin 


Trp 


Ala 


Ser 


Arg 


Gly 


Gly Asp 


Glu 


Thr 






915 










920 










925 








Thr 


Glu 


Val 


Ser 


Ser 


Glu 


Ala 


Thr 


Ser 


Ala 


Pro 


Ser 


Gin 


Ser 


Glu 


Ser 




930 










935 










940 








Glu 


Lys 


Leu 


Arg 


Thr 


Ala 


Phe 


He 


Glu 


Ser 


Ala 


Ala 


He 


Glu 


Thr 


Phe 


945 










950 










955 










960 


Phe 


Leu 


Trp 


His 


Lys 
965 


Tyr 


Lys 


Glu 


Glu 


Lys 
970 


Lys 


Pro 


Pro 


Ala 


Thr 
975 


Gin 


Asp 


Gly Ala 


Gly 


Leu 


Gly 


Val 


Ser 


Leu 


Pro 


Glu 


Pro 


Ser 


Pro 


Pro 


Gly 








980 










985 










990 




Glu 


Asp 


Pro 


Gin 


Thr 


Gin 


Leu 


Gin 


Gin 


Thr 


Gly Val 


He 


Pro 


Pro 


Asp 






995 










1000 








1005 




Phe 


Leu 


Arg 


Gin 


Met 


Phe 


Tyr 


Thr 


Leu 


Ala 


Asp 


Tyr 


Lys 


Asp 


He 


Leu 




1010 








1015 








102( 
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Tyr Ser Gly Ser Asn Asp Thr Ser Asp Thr Thr Gly Lys Gin Thr Pro 
1025 1030 1035 1040 

Ser Ser Ser Asn Asp Asn Leu Lys Asn lie Val Leu Glu Ala Ser Gly 
1045 1050 - 1055 > 

5 Ser Thr Glu Gin Glu Lys Glu Lys Met Lys Gin He Gin Ala Lys He 

1060 1065 1070 

Lys Lys He Leu Asn Gly Ala Thr Ser Gly Val Pro Pro Val Thr Lys 

1075 1080 1085 

Asn Ser Val Lys Thr Pro Gin Gin Thr Trp Trp Glu Asn He Ala Lys 
10 1090 1095 1100 

Asp He Trp Asn Ala Met Val Cys Ala Leu Thr Tyr Lys Glu Asn Asp 
1105 1110 1115 1120 

Ala Arg Gly Thr Ser Ala Lys He Glu Gin Asn Lys Asp Leu Lys Lys 

1125 1130 1135 

Ala Leu Trp Asp Glu Ala Asn Lys Asn Thr Pro He Glu Lys Tyr Gin 

1140 1145 1150 

Tyr Thr Asn Val Lys Leu Glu Asp Glu Ser Gly Ala Lys Ser Asn Asp 

1155 1160 1165 

Thr He Gin Pro Pro Thr Leu Lys Asn Phe Val Glu He Pro Thr Phe 
20 1170 1175 1180 

Phe Arg Trp Leu His Glu Trp Gly Asn Ser Phe Cys Phe Glu Arg Ala 
1185 1190 1195 1200 

Lys Arg Leu Ala Gin He Lys His Glu Cys Met Asp Glu Asp Gly Glu 

1205 1210 ; 1215 

Lys Gin Tyr Ser Gly Asp Gly Glu Tyr Cys Glu Glu He Phe Ser Lys 

1220 1225 1230 

Gin Tyr Asn Val Leu Gin Asp Leu Ser Ser Ser Cys Ala Lys Pro Cys 

1235 1240 1245 

Arg Leu Tyr Lys Thr Trp He Glu Lys Lys Lys Thr Glu Tyr Glu Lys 
30 1250 1255 1260 

Gin Gin Lys Ala Tyr Glu Gin Gin Lys Ser Asn Tyr Glu Asn Glu Gin 
1265 1270 1275 1280 

Lys Asp Lys Cys Gin Thr Gin Ser Asn Asn Asn Ala Asn Glu Phe Ser 

1285 1290 1295 

Arg Thr Leu Gly Ala Ser Pro Thr Ala Ala Glu Phe Leu Gin Lys Leu 

1300 1305 1310 

Gly Ser Cys Lys Asn Asp Asn Gly Tyr Glu Asn Gly Glu Asp^ Asn Lys 

1315 1320 1325 

lie Asp Phe Lys Asn Pro Asp Lys Thr Phe Lys Glu Ala His Ser Cys 
40 1330 1335 1340 

Asp Pro Cys Pro He Thr Gly Val Lys Cys Gin Asn Gly His Cys Val 
1345 1350 1355 1360 

Gly Ser Ala Asn Gly Lys Glu Cys Lys Asn Asn Lys He Thr Ala Glu 

1365 1370 - 1375 

Asp He Lys Asn Lys Thr Asp Pro Asn Gly Asn He Glu Met Val Val 

1380 1385 1390 

Ser Asp Asp Ser Thr Asn Thr Phe Glu His Leu Gly Asp Cys Lys Ser 

1395 1400 1405 

Ser Gly He Phe Lys Gly He Arg Lys Asp Glu Trp Lys Cys Ala Asn 
50 1410 1415 1420 

Val Cys Gly Val Asp He Cys Thr Leu Glu Lys Lys He Lys Asn Gly 
1425 1430 1435 1440 

Gin Glu Gly Asp Lys Lys Tyr He Thr Met Lys Glu Leu Leu Lys Arg 

1445 1450 1455 

Trp Leu Glu Tyr Phe Leu Glu Asp Tyr Asn Arg He Arg Lys Lys He 

1460 1465 1470 

Lys Leu Cys Thr Lys Lys Glu Asp Gly Cys Lys Cys He Lys Gly Cys 

1475 1480 1485 

He Glu Lys Trp Val Gin Glu Lys Thr Lys Glu Trp Gin Lys He Asn 
60 1490 1495 1500 

Asp Thr Tyr Leu Glu Gin Tyr Lys Asn Asp Asp Gly Asn Thr Leu Thr 
1505 1510 . 1515 1520 

Asn Phe Leu Glu Gin Phe Gin Tyr Arg Thr Glu Phe Lys Asn Ala He 

1525 1530 1535 

Lys Pro Cys Asp Gly Leu Asp Gin Phe Lys Thr Ser Cys Gly Leu Asn 



35 



45 



55 



65 



1 



10 



20 



30 
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1540 1545 1550 

Ser Thr Asp Asn Ser Gin Asn Gly Asn Asn Asn Asp Leu Val Leu Cys 

- ^ 1555 „ . 1560 _ 1565 

Leu Leu Asn Lys Leu Gin Lys Lys He Ser Giu Cys Lys Glu Gln-His 

1570 1575 1580 

Ser Gly Gin Thr Gin Thr Pro Cys Asp Asn Ser Ser Leu Ser Glv Lvs 

1590 1595 ^ 1600 

Glu Ser Thr Leu Val Glu Asp Val Asp Asp Tyr Glu Glu Gin Asn Pro 

1605 1610 1615 

Glu Asn Lys Val Glu Gin Pro Lys Phe Cys Pro Asp Met Lys Glu Pro 

1620 1625 1630 

Lys Lys Glu Asn Asp Glu Glu Val Gly Thr Cys Gly Gly Asp Glu Glu 
1635 1640 1645 

Lys Lys Val Glu Asp Ser Val He Glu Gin Lys Glu Glu Glu Ala 
1650 1655 1660 

Ala Ser Ala Pro Glu Glu Ser Pro Pro Leu Thr Pro Glu Ala Pro Lvs 
1665 1670 1675 1680 

Lys Glu Glu Asn Val Val Pro Lys Pro Pro Pro Pro Pro Lys Lys Arg 

1685 1690 1695 

Arg He Lys Thr Arg Asn Val Leu Asp His Pro Ala Val He Pro Ala 

1700 1705 1710 

Leu Met Ser Ser Thr He Met Trp Ser He Gly He Gly Phe Ala Ala 
1715 1720 1725 

«c '^^^ Lys Lys Thr Lys Ser Ser Val Gly Asn 

25 1730 1735 1740 

Leu Phe Gin He Leu Gin He Pro Lys Ser Asp Tyr Asp He Pro Thr 
1745 1750 1755 1760 

Leu Lys Ser Ser Asn Arg Tyr He Pro Tyr Ala Ser Asp Arg His Lys 

1765 1770 1775 

Gly Lys Thr Tyr He Tyr Met Glu Gly Asp Ser Ser Gly Asp Glu Lys 

1780 1785 1790 

Tyr Ala Phe Met Ser Asp Thr Thr Asp He Thr Ser Ser Glu Ser Glu 

1795 1800 1805 

Tyr Glu Glu Leu Asp He Asn Asp He Tyr Val Pro Gly Ser Pro Lvs 
35 1810 1815 1820 

Tyr Lys Thr Leu He Glu Val Val Leu Glu Pro Ser Lys Arg Asp Thr 

-18 2 5- 183 0 " - -183 5 - ~ 1840" 

Gln Asn Asp He His Asn Asp He Pro Ser Asp He Pro Asn Ser Asp 

1845 1850 1855 

Thr Pro Pro Pro He Thr Asp Asp Glu Trp Asn Gin Leu Lys Lys Asp 

I860 1865 1870 

Phe He Ser Asn Met Leu Gin Asn Thr Gin Asn Thr Glu Pro Asn He 

1875 1880 1885 

Leu His Asp Asn Val Asp Asn Asn Thr His Pro Thr Met Ser Arg His 
^5 1890 1895 1900 

Asn Met Asp Gin Lys Pro Phe He Met Ser He His Asp Arg Asn Leu 
1905 1910 1915 1920 

Phe Ser Gly Glu Glu Tyr Asn Tyr Asp Met Phe Asn Ser Gly Asn Asn 

1925 1930 1935 

P^o lie Asn He Ser Asp Ser Thr Asn Ser Met Asp Ser Leu Thr Ser 

1940 1945 ' 1950 

Asn Asn His Ser Pro Tyr Asn Asp Lys Asn Asp Leu Tyr Ser Gly He 

1955 1960 1965 

Asp Leu He Asn Asp Ala Leu Ser Gly Asn His He Asp He Tyr Asp 
" 1970 1 9*7^ 1980 

Glu Met Leu Lys Arg Lys Glu Asn Glu Leu Phe Gly Thr Gin His His 
1985 1990 1995 2000 

Pro Lys Asn He Thr Ser Asn Arg Val Val Thr Gin Thr Ser Ser Asp 

2005 2010 2015 

Asp Pro He Thr Asn Gin He Asn Leu Phe His Lys Trp Leu Asp Arg 

2020 2025 2030 

His Arg Asp Met Cys Glu Lys Trp Lys Asn Asn His Glu Arg Leu Pro 
2035 2040 2045 

„ Lys Leu Lys Glu Leu Trp Glu Asn Glu Thr His Ser Gly Asp He Asn 

65 20.50 2055 2060 
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Ser Gly He Pro Ser Gly Asn His Val Leu Asn Thr Asp Val Ser He 
2065 2070 2075 2080 

Gin He Asp Met Asp Asn Pro Lys Thr Met Asn Glu Phe Thr Asn Met 
2085 2090 2095 - 

b Asp Thr Asn Pro Asp Lys Ser Thr Met Asp Thr He Leu Asp Asp Leu 

2100 2105 2110 

Glu Lys Tyr Asn Glu Pro Tyr Tyr Tyr Asp Phe Tyr Lys His Asp He 

2115 2120 2125 

Tyr Tyr Asp Val Asn Asp Asp Lys Ala Ser Glu Asp His He Asn Met 
10 2130 2135 2140 

Asp His Asn Lys Met Asp Asn Asn Asn Ser Asp Val Pro Thr Asn Val 
2145 2150 2155 2160 

Gin He Glu Met Asn Val He Asn Asn Gin Glu Leu Leu Gin Asn Glu 
2165 2170 2175 

15 Tyr Pro He Ser His Met 

2180 

(2) INFORMATION FOR SEQ ID NO: 17: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

30 . (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17; 
ATCGATCAGC TGGGAAGAAA TACTTCATCT 3 0 

(2) INFORMATION FOR SEQ ID NO: 18: 



35 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 

40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
45 (iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

ATCGATGGGC CCCGAAGTTT GTTCATTATT 3 0 

(2) INFORMATION FOR SEQ ID NO: 19: 

55 

{ i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 
60 - (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
65 (v) FRAGMENT TYPE: 
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(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

5 TCTCGTCAGC TGACGATCTC TAGTGCTATT 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
0 (A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ACGAGTGGGC CCTGTCACAA CTTCCTGAGT 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
AGACCTCAAT TTCTAAG 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOUKUii;: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
AATCGCGAGC ATCATCTG 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 



10 



15 
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(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CCRAGRAGRC AARAAYTATG 

(2) INFORMATION FOR SEQ ID NO: 24: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
25 (v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:. 
30 CCAWCKKARR AATTGWGG 

(2) INFORMATION FOR SEQ ID NO: 25: 

( i ) SEQUENCE CHARACTERISTICS : 
35 (A) LENGTH: 2 91 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



40 



45 



50 



55 
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(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 





(xi) 


SEQUENCE 


DESCRIPTION 


: SEQ ID 


NO:, 


25 : 










Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa, 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


1 








5 










10 








15 




Xaa 


Xaa 


Xaa 


Val 


Cys 


He 


Pro 


Asp 


Arg 


Arg 


Tyr 


Gin 


Leu 


Cys 


Met 


Lys 


Glu 






20 










25 










30 




Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






35 










40 










45 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




50 










55 










60 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


65 










70 










75 










80 


Xaa 


Asp 


Phe 


Cys 


Lys 


Asp 


He 


Arg 


Trp 


Ser 


Leu 


Gly Asp 


Phe 


Gly Asp 


He 








85 










90 










95 




He 


Met 


Gly 


Thr 


Asp 


Met 


Glu 


Gly 


He 


Gly 


Tyr 


Ser 


Lys 


Xaa 


Xaa 








100 










105 










110 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Thr 


Asp 


Glu 


Lys 


Ala 


Gin 


Gin 






115 










120 










125 








Arg 


Arg 


Lys 


Gin 


Trp 


Trp 


Asn 


Glu 


Ser 


Lys 


Ala 


Gin 


He 


Trp 


Thr 


Ala 




130 










135 










140 









20 



18 
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5 



15 



Met 


Met 




J. 


V CL J. 




Ya a 

Aaa 


Yo a 

Aaa 


Aaa 


Aaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


JL % D 










150 










155 










160 




Aaa 


Ya 

Aaa 


Aaa 


Aaa 


Xaa 


Xaa 


Xaa 


Xaa 


Glu 


Pro 


Gin 


He 


Tyr 


Arg 


Trp 


JL X6 








lb b 










170 








175 


Arcf 




Trp 




Arg 


Asp 


Tyr 


Val 


Ser 


Glu 


Leu 


Pro 


Thr 


Glu 


Val 








ion 
± o U 










185 










190 




Lys 


Lieu 


uys 


Glu 


Lys 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






135 










200 










205 








Y= = 

Aaa 


Cys 


Aaa 


vax 


Pro 


Pro 


Cys 


Gin 


Asn 


Ala 


Cys 


Lys 


Ser 


Tyr 


Asp 




^ X u 










215 










220 






Gin 


Trp 


He 


Thr 


Arg 


Lys 


Lys 


Asn 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


225 










230 










235 










240 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








245 










250 










255 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








260 










265 










270 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






275 










280 










285 







Cys Xaa Cys 
20 290 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 271 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

35 



40 



45 



50 



60 
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(xi) i 


SEQUENCE 


DESCRIPTION 


: SEQ -ID 


NO:; 


26 : 










Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


1 








5 










10 








15 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Val 


Cys 


He 


Pro 


Asp 


Arg 


Arg 


He 


Gin 


Leu 


Cys 








20 










25 










30 




He 


Val 


Asn 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






35 










40 










45 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




50 










55 










60 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


Phe 


Cys 


Asn 


Asp 


Leu 


Lys 


Asn 


65 










70 










75 






80 


Ser 


Phe 


Leu 


Asp 


Tyr 


Gly 


His 


Leu. 


Ala 


Met 


Gly 


Asn 


Asp 


Met 


Asp 


Phe 










85 










90 










95 




Gly 


Gly 


Tyr 


Ser 


Thr 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








100 










105 










110 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Ser 


Glu 


His 


Lys 


He 


Lys 


Asn 


Phe 


Arg 


Lys 






115 










120 










125 




Glu 


Trp 


Trp 


Asn 


Glu 


Phe 


Arg 


Glu 


Lys 


Leu 


Trp 


Glu 


Ala 


Met 


Leu 


Ser 














135 










1 jt /\ 

JL-X W 










Glu 


His 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Glu 


145 










150 










155 










160 


Leu 


Gin 


He 


Thr 


Gin 


Trp 


He 


Lys 


Glu 


Trp 


His 


Gly 


Glu 


Phe 


Leu 


Leu 










165 










170 










175 




Glu 


Arg 


Asp 


Asn 


Arg 


Ser 


Lys 


Leu 


Pro 


Lys 


Ser 


Lys 


Cys 


Xaa 


Xaa 


Xaa 








180 










185 










190 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Glu 


Lys 


Glu 


Cys 


He 


Asp 


Pro 


Cys 


Met 






195 










200 










205 






Lys 


Tyr 


Arg 


Asp 


Trp 


He 


He 


Arg 


Ser 


Lys 


Phe 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




210 










215 










220 
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15 



25 



30 



35 



40 



45 



50 



55 



60 



Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

230 235 240 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

245 250 255 

Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Cvs 
260 265 270 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 
20 (vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27; 



Cys 
1 


^CLCX 


Actd 


' Yr> za 
Add 


Aaa 

c 


Yja a 

Aaa 


Aaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


^ctd 


Ada. 


Aaa 


Aaa 


Aaa 


Aaa 


Xaa 


Xaa 


10 
Val 


Cys 


Val 


Pro 








^ u 










25 












Leu, 


Cys 


Leu 


Gly 


Asn 


He 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






35 










40 










45 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


50 










55 










60 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Glu 


65 










70 










75 




lie 


He 


Asn 


Lys 


Thr 


Phe 


Ala 


Asp 


He 


Arg 


Asp 


He 


He 


Asp 








85 










90 






Tyr 


Trp 


Asn 


Asp 


Leu 


Ser 


Asn 


Arg 


Xaa 


Xaa 


Xaa 


Xaa 








100 










105 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Asn 


Lys 


Lys 


Asn 


Asp 


Arg 




115 










120 










125 


Asp 


Glu 


Trp 


Trp 


Lys 


Val 


He 


Lys 


Lys 


Asp 


Val 


Trp 


Ser 


130 










135 








140 


Trp 


Phe 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


145 










150 








155 






He 


Pro 


Gin 


Phe 


Phe 


Arg 


Trp 


Phe 


Ser 


Glu 


Trp 


Gly Asp 


Gin 








165 










170 








Asp 


Lys 


Thr 


Lys 


Met 


He 


Glu 


Thr 


Leu 


Lys 


Val 


Glu 








180 










185 








Xaa 


Xaa 


Cys 


Xaa 


Asp 


Asp 


Asn 


Cys 


Lys 


Ser 


Lys 


Cys 


Asn 


Glu 




195 










200 






205 


Trp 


He 


Ser 


Lys 


Lys 


Lys 


Lys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




210 










215 










220 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


225 










230 










235 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 










245 










250 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








260 










265 










Xaa 


Cys 


Xaa 


Xaa 


Cys 






















275 























15 

Pro Arc 
30 



80 

Gly Gly Thj 
95 

Xaa Xaa Xaa 
110 



160 

Asp Tyr Cys 

175 
Cys Xaa Xaa 
190 



240 

Xaa Xaa Xaa 

255 
Xaa Xaa Xaa 
270 



(2) INFORMATION FOR SEQ ID NO: 28 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 282 amino acids 
65 (B) TYPE: amino acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 



Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




^dd 


Aaa 


1 








5 










10 








15 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Val 


Cys 


Glv 






Arg 


Arg 


Gin 






20 










25 










3 0 


Gin 


Leu 


Cys 


Leu 


Gly 


Tyr 


He 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


iA.ClCl 


Add 


Ada 






35 










40 










45 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Add 


Vaa 
Add 


Aaa 




50 










55 










60 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


Tl 

Jl, 


uys 


Asn 


65 










70 










75 






R n 

O \J 


Ala 


He 


Leu 


Gly 


Ser 


Tyr 


Ala 


Asp 


He 


Gly Asp 


He 


Val 


Arg 


V7X y 


Leu 










85 










90 








95 




Asp 


Val 


Trp 


Arg 


Asp 


He 


Asn 


Thr 


Asn 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
Add 


Xaa 
Add 








100 










105 










110 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


Lys 


Gin 


A c ri 




Asn 




Glu 












120 










125 






Asn 


Arg 


Asn 


Lys 


Trp 


Trp 


Glu 


Lys 


Gin 


Arg 


Asn 


Leu 


He 


Trp 


Ser 




130 










135 










140 








Ser 


Met 


Val 


Lys 


His 


He 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


145 










150 










155 








160 


Xaa 


Xaa 


Xaa 


Xaa 


He 


Pro 


Gin 


Phe 


Leu 


Arg 


Trp 


Leu 


Lys. 


Glu 


Trp 


Gly 










165 










170 










175 


Asp 


Glu 


Phe 


Cys 


Glu 


Glu 


Met 


Gly 


Thr 


Glu 


Val 


Lys 


Gin 


Leu 


Glu 


Lys 


He 






180 










185 










190 




Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Glu 


Lys 


Lys 


Cys 


Lys 


Asn 


Ala 


Cys 






195 










200 










205 






Ser 


Ser 


Tyr 


Glu 


Lys 


Trp 


He 


Lys 


Glu 


Arg 


Lys 


Asn 


Xaa 


Xaa 


Xaa 


Xaa 




210 










215 








220 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


225 










230 










235 










240 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 










245 










250 










255 




Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








260 










265 










270 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Cys 















275 280 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 324 amino acids, 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) Tnvnj.nav, linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa 



10 



20 



30 



WO 96/40766 PCTAJS96/09508 

•80- 

15 10 15 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Ala Cys He Pro Pro Arg Arg Gin Lvs 

-20 25 30 

Leu Cys Leu His Tyr Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

35 40 45 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

50 55 go 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

70 75 30 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Asp Phe Lys Arg Gin Met Phe 

85 90 95 

Tyr Thr Phe Ala Asp Tyr Arg Asp He Cys Leu Gly Thr Asp He Ser 

100 105 110 

Ser Lys Lys Asp Thr Ser Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
1^ 115 120 125 

Xaa Xaa Xaa Xaa Xaa Lys He Ser Asn Ser He Arg Tyr Arg Lys Ser 

130 135 140 

Trp Trp Glu Thr Asn Gly Pro Val He Trp Glu Gly Met Leu Cys Ala 

150 155 ^ IgO 

Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

165 170 175 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

180 185 190 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Arg Pro Gin Phe Leu 

195 200 205 

Arg Trp Leu Thr Glu Trp Gly Glu Asn Phe Cys Lys Glu Gin Lvs Lvs 

210 215 220 

Glu Tyr Lys Val Leu Leu Ala Lys Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

230 235 240 

Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Cys Val Ala Cys Lys Asp Gin Cvs 

245 250 255 

Lys Gin Tyr His Ser Trp He Gly He Trp He Asp Xaa Xaa Xaa Xaa 

260 265 270 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
35 275 280 285 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

290 295 300 

Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cvs 

310 315 320 

40 Xaa Xaa Xaa Cys 

(2) INFORMATION FOR SEQ ID NO: 30: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 62 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 
55 (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Ala Cys Ala Pro Tyr Arg Arg Leu His Leu Cys Asp Tyr Asn Leu Xaa 

60 1 5 10 15 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

20 25 30 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
35 40 .45 

6b Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gin Leu Cys Thr Val Leu 



wo 96/40766 



.81- 



PCT/US96/09508 



Ala 

-65 


50 










b b 










60 










Arcr 


Ser 


Phe 


Ala 


Asp 
/ u 


Tie 

JL J. C 


Gly Asp 


X xe 


vax 
75 


Arg 


Gly 

- - 


Lys 


Asp 


Leu 
80 


Tyr 


Leu 


Glv 






Aon 
noli 


Lys 


Xaa 


Xaa 


xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








ft ^ 










90 










95 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


^dd 


Xaa 


Xaa 


Y= =» 

Aaa 


Aaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






100 










105 










110 




Xaa 


Xaa 
115 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
120 


Xaa 


Yrj = 

Aaa 


Aaa 


Aaa 


Lys 
125 


Gly 


Gly Asp 


Phe 


Phe 


Gin 


Leu 


Arg 


Glu 


Asp 


Trp 


Trp 


Thr 


Ser 


Asn 


Arg 


Glu 


Thr 


Val 


Trp 


13 0 










X J b 










140 






Lys 


Ala 


Leu 


He 


Cys 


His 


Ala 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


145 


Xaa 








X o u 










155 








160 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








16 5 










X /U 










175 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Add 


Yaa 
Add 


V d X 


Pro 


Gxn 


Tyr 


Leu 


Ara 






1 ft n 

X O w 










185 










190 






Phe 

J 


Glu 


Glu 




AX d 


Glu 
2 00 


Asp 


fne 


Cys 


Arg 


Lys 
205 


Lys 


Lys 


Lys 


Lvs 


Leu 
210 


Glu 


Asn 


Leu 




T .VG 

Z Xb 


Gin 


Cys 


Aaa 


Aaa 


Xaa 
220 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 
225 


Xaa 


Xaa 


Xaa 


Xaa 


Yaa 

o 1 n 
^ J u 


Add 


Xaa 


Xaa 


Aaa 


Aaa 
235 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 
240 


Thr 


Asn 


Cys 


Ser 


Val 




Cys 


Arg 


Met 


lyr 


VjXU 


inr 


Trp 


He 


Asp 


Asn 


Gin 








^ ^ ^ 










o c r\ 
^b U 








255 




Lvs 


Lvs 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Ya a 

Aaa 


Yo =a 

Aaa 


Aaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






260 










265 










270 


Xaa 


Xaa 


Xaa 


Xaa 


Y;^ 7^ 


Ya j=i 
Add 


Xaa 


Xaa 


Aaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 














280 










285 






Xaa 


Add 


Ada. 


Aaa 


Aaa 


Aaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




^ .7 W 










2 95 










300 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


305 










310 










315 










320 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








325 










330 










335 


Xaa 


Xaa_ 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








3 4 0' 










3 45 










3 50 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Cys 
















355 










360 

















(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 411 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(xi) S'c.\^\JErvA\lc. DESCRIPTICIT : SSQ ID NO; 31; 

Cys Xaa Xaa Xaa, Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

15 10 15 

Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

20 25 30 

Ala Cys Ala Pro Tyr Arg Arg Leu His Val Cys Asp Gin Asn Leu Xaa 

35 40 45 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

50 55 60 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
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82- 

80 

lie Cys Thi 
95 

Arg Gly Arc 
110 



60 



65 



65 










70 










75 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Gin 










85 










90 








Lieu 


Ala 


Arg 


Ser 


Phe 


Ala 


Asp 


He 


Gly Asp 


He 


Val 








100 










105 










Asp 


Leu 


Tyr 


Leu 


Gly Asn 


Pro 


Gin 


Glu 


Xaa 


Xaa 


Xaa 


Xaa 






115 










120 










125 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




130 










135 










140 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Asn 


Asp 


Pro 


Glu 


Phe 


Phe 


145 










150 








155 




Glu 


Asp 


Trp 


Trp 


Thr 


Ala 


Asn 


Arg 


Glu 


Thr 


Val 


Trp 


Lys 










165 










170 




Cys 


Asn 


Ala 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 








180 










185 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






195 










200 










205 


Xaa 


Xaa 


Xaa 


Xaa 


Val 


Pro 


Gin 


Tyr 


Leu 


Arg 


Trp 


Phe 


Glu 


Glu 


210 










215 






220 




Asp 


Phe 


Cys 


Arg 


Lys 


Lys 


Asn 


Lys 


Lys 


He 


Lys 


Asp 


225 










230 










235 


Asn 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 










245 










250 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








260 










265 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


He 


Ser 


Cys 


Leu 


Tyr 


Ala 


Cys 


Val 




275 










280 








285 


Asp 


Trp 


lie 


Asn 


Asn 


Gin 


Lys 


Glu 


Xaa 


Xaa 


Xaa 


Xaa 




290 










295 








3 00 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


305 










310 










315 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 










325 










330 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








340 










A t; 

J T D 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






355 










360 










365 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




370 










375 










380 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


385 










390 










395 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Cys 














405 










410 










(2) 


INFORMATION FOR SEQ ID 


NO: 32: 







10 



15 



20 



25 



30 



35 



40 



45 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 411 amino acids 

(B) TYPE: amino acid 

50 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 
55 (iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 



160 

Ala He Thr 

175 
Xaa Xaa Xaa 
190 



240 

Xaa Cys Xaa 

255 
Xaa Xaa Xaa 
270 



320 

Xaa Xaa Xaa 

335 
Xaa Xaa Xaa 
350 



400 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

^5 10 15 

Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

20 25 30 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 



wo 96/40766 PCT/US96/09S08 

•83- 



35 

Xaa Xaa Val 
50 

Leu Xaa Xaa 
5 65 

Xaa Xaa Xaa 

Xaa Xaa Xaa 

10 Xaa Xaa Xaa 
115 

Ser Phe Ala 

130 
Glu Asp Lys 
15 145 

Xaa Xaa Xaa 

Xaa Xaa Xaa 

20 Trp Trp Glu 
195 

Thr Xaa Xaa 

210 
Gin Arg Leu 
25 225 

Gin Ser Gin 

Xaa Xaa Xaa 

30 Lys Cys Lys 

275 

Asn Glu Gin 

290 
Xaa Xaa Xaa 
35 305 

Xaa Xaa Xaa 

Xaa Xaa Xaa 

40 Xaa Xaa Xaa 
355 

Xaa Xaa Xaa 

370 
Xaa Xaa Xaa 
45 385 

Xaa Xaa Xaa 



Phe Leu Pro Pro 
55 

Xaa Xaa Xaa Xaa 
70 

Xaa Xaa Xaa Xaa 
85 

Xaa Xaa Xaa Xaa 
100 

Xaa Xaa Xaa Xaa 

Asp Leu Gly Asp 
135 

Ser Ser Xaa Xaa 
150 

Xaa Xaa Xaa Xaa 
165 

Xaa Xaa Lys Lys 
180 

Ala Asn Arg His 

Xaa Xaa Cys Xaa 
215 

Arg Trp Met Thr 
230 

Glu Tyr Asp Lys 
245 

Xaa Xaa Xaa Xaa 
260 

Ala Ala Cys Asp 

Trp Arg Lys Xaa 
295 

Xaa Xaa Xaa Xaa 
310 

Xaa Xaa Xaa Xaa 
325 

Xaa Xaa Xaa Xaa 
340 

Xaa Xaa Xaa Xaa 

Xaa Xaa Xaa Xaa 
375 

Xaa Xaa Xaa Xaa 
390 

Xaa Xaa Xaa Xaa 
405 



40 

Arg Arg Glu His Met 
60 

Xaa Xaa Xaa Xaa Xaa 
75 

Xaa Xaa Xaa Xaa Xaa 
90 

Xaa Xaa Xaa Xaa Xaa 
105 

Xaa Ala Met Cys Arg 
120 

lie lie Arg Gly Arg 
140 

Xaa Xaa Xaa Xaa Xaa 
155 

Xaa Xaa Xaa Xaa Xaa 
170 

Pro Ala Tyr Lys Lys 
185 

Gin Val Trp Arg Ala 
200 

Xaa Xaa Xaa Xaa Xaa 
220 

Glu Trp Ala Glu Trp 
235 

Leu Lys Lys lie Cys 
250 

Cys Xaa Xaa Xaa Xaa 
265 

Lys Tyr Lys Glu Glu 
280 

Xaa Xaa Xaa Xaa Xaa 
300 

Xaa Xaa Xaa Xaa Xaa 
315 

Xaa Xaa Xaa Xaa Xaa 
330 

Xaa Xaa Xaa Xaa Xaa 
345 

Xaa Xaa Xaa Xaa Xaa 
360 

Cys Xaa Xaa Xaa Xaa 
380 

Xaa Xaa Xaa Xaa Xaa 
395 

Cys Xaa Xaa Cys 
410 



45 

Cys Thr Ser Asn 

Xaa Xaa Xaa Xaa * 
80 

Xaa Xaa Xaa Xaa 
95 

Xaa Xaa Xaa Xaa 
110 

Ala Val Arg Tyr 
125 

Asp Met Trp Asp 

Xaa Xaa Xaa Xaa 
160 

Xaa Xaa Xaa Xaa 
175 

Leu Arg Ala Asp 
190 

Met Lys Cys Ala 
205 

Xaa Xaa lie Pro 

Tyr Cys Lys Ala 
240 

Xaa Xaa Xaa Xaa 
255 

Xaa Xaa Cys Gly 
270 

lie Glu Lys Trp 
285 

Xaa Xaa Xaa Xaa 

Xaa Xaa Xaa Xaa 
320 

Xaa Xaa Xaa Xaa 
335 

Xaa Xaa Xaa Xaa 
350 

Xaa Xaa Xaa Cys 
365 

Xaa Xaa Xaa Xaa 

Xaa Xaa Xaa Xaa 
400 



(2) INFORMATION FOR SEQ ID NO: 33: 

50 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 311 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
55 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

60 (v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 



65 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa 



wo 96/40766 PCT/US96/09508 

•SA- 



ID 



15 



20 



25 



30 



35 



40 



1 








5 










10 










15 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Ala 


Cys 


Met 


Pro 


Pro 


ArQ 


Arcr 


Gin 




J-iC Li 








20 










25 








30 




Cys 


Leu 


Tyr 


Tyr 


He 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
Add 






35 










40 










45 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
Add 




50 










55 










60 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
Add 


65 










70 










75 








80 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Gin 


Phe 


Leu 


Ara 


Ser 


Met 


Met 










85 










90 








95 




Tyr 


Thr 


Phe 


Gly 
100 


Asp 


Tyr 


Arg 


Asp 


He 
105 


Cys 


Leu 


Asn 


Thr 


Asp 
110 


He 


Ser 


Lys 


Lys 


Gin 


Asn 


Asp 


Val 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 

Add 






115 










120 










125 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Ser 


Lys 


Ser 


Pro 


Ser 


Glv 


Leu 


Ser 


Arg 


m n 


m n 

V7 X U 




130 










135 










140 








Trp 


Trp 


Lys 


Thr 


Asn 


Gly 


Pro 


Glu 


He 


Trp 


Lvs 


Glv 


Met 


Leu 


V— jr O 


Ala 

t\.±. d 


145 










150 










155 






160 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




Xaa 
Add 


Xaa 








165 










170 










175 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








180 










185 










190 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


Pro 


Gin 


Phe 


Leu 


Arcr 


TrD 

X J. ^ 


Met 


-i- ^ 


OA 11 






195 










200 








205 








Trp 


Gly 


Glu 


Glu 


Phe 


Cys 


Ala 


Glu 


Arg 


Gin 


Lvs 


Lvs 


Glu 


Asn 




Tie 

J. X c 




210 










215 








220 








Lys 


Asp 


Ala 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cvs 


Xaa 


Xaa 


Add 


225 


His 








230 










235 








•5 A n 


Lys 


Arg 


Cys 


Asn 


Gin 


Ala 


Cys 


Arg 


Ala 


Tvr 


Gin 


Glu 


lyi: 


Va T 
vox 




Asn 








245 










250 








255 




Lys 


Lys 


Lys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








260 










265 










270 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






275 










280 










285 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 




290 










295 










300 








Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Cys 




















305 










310 





















(2) INFORMATION FOR SEQ ID NO: 34: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 

45 (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
50 (iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 

Pro Arg Arg Gin Xaa Leu Cys 
1 5 

60 (2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 
65 (C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: gDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CCRAGRAGRC AARAAYTATG 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
CCSMGSMGSC AGCAGYTSTG 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Phe Ala Asp Xaa Xaa Asp lie 
1 5 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRAi^EDNESS ; sinyl^ 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL : NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
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TTTGCWGATW WWSGWGATAT 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 : 
TTCGCSGATW WCSGSGACAT 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Pro Gin Phe Xaa Arg Trp 
1 5 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
CCAWCKKARR AATTGWGG 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
CCASCKGWAG AWCTGSGG 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

Glu Trp Gly Xaa Xaa Xaa Cys 
1 5 

(2). INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) AJ^TISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
CAAWAWTCWT CWCCCCATTC 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 



CAGWASTCST CSCCCCACTC 
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WE CLAIM : 

1. A composition comprising a nucleotide sequence of the DBL gene family, wherein said nucleotide 
sequence is selected from the group consisting of the var-h var2, var3 and var-? genes. 

2. The composition of Claim 1, wherein the nucleotide sequence of the var-U var-2, var-3 or var ? 
5 gene encodes a cysteine-rich domain homologous to a cysteine rich domain of a Duffy Antigen Binding Protein (DABP) 

derived from Plasmodium vivax and a Sialic Acid Binding Protein (SABP) derived from Plasmodium falciparum, 

3. The composition of Claim 1, wherein the nucleotide sequence of the var l. var-l, var 3 or var ? 
gene encodes a cysteine-rich interdomain region between a first domain and a second domain. 

4. The composition of Claim 1, wherein the nucleotide sequence is derived from a coding region of 
10 SEQIDNO:13orSEQmNO:15. 

5. A composition comprising a polypeptide encoded by a nucleotide sequence of the DBL gene family, 
wherein said polypeptide is encoded by a var-h var'2, vdr'3 or K5A-7gene. 

6. The composition of claim 5, wherein the polypeptide comprises a sequence of amino acid residues 
homologous to cysteine rich domains of a Duffy Antigen Binding Protein (DABP) derived from Plasmodium vivax and 

15 a Sialic Acid Binding Protein ISABP) derived from Plasmodium falciparum, 

7. The composition of claim 5, wherein the polypeptide comprises a sequence of about 300 to 400 
amino acid residues occuring in the cysteine-rich interdomain region between a first domain and a second domain of 
a polypeptide encoded by the varl, var 2, var-S or var-? genB. 

8. The composition of claim 5, wherein the polypeptide comprises a sequence of amino acid residues 
20 of SEQ ID N0:14 or SEQ ID N0:16. 

9. The composition of claim 5, wherein the polypeptide comprises a sequence of about 50 to about 
325 amino acid residues of SEQ ID N0:14 or SEQ ID NG:16. 

10. The composition of claim 5, wherein the polypeptide comprises a sequence of about 75 to about 
300 amino acid residues of SEQ 10 N0:14 or SEQ ID N0:16. 

25 11. The composition of claim 5, wherein the polypeptide comprises a sequence of about 100 to about 

250 amino acid residues of SEQ ID N0:14 or SEQ ID N0:16. 

12. The composition of claim 5, further comprising a pharmaceutically acceptable carrier and an 
isolated Duffy Antigen Binding Protein (DABP) binding domain polypeptide, a Sialic Acid Binding Protein (SABP) 
binding domain polypeptide, or a combination thereof, in an amount sufficient to induce a protective immune response 

30 to Plasmodium merozoites in a mammal. 

13. The composition of any of the preceding claims for use in inducing a protective immune response 
to Plasmodium merozoites in a mammal. 

14. Use of the composition of any one of claims 1-12 in the preparation of a medicament for inducing 
a protective immune response to Plasmodium merozoites in a mammal. 

35 15. A method of inducing a protective immune response to Plasmodium merozoites in a mammal, 

comprising administering to a mammal an immunologically effective amount of a pharmaceutical composition 
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comprising a pharmaceutically acceptable carrier and an isolated cysteine-rich polypeptide encoded by a var gene 
selected from the group of genes consisting ^\ vahhyah2, var 3 and ra/--/ genes. 

16. The method of claim 15, further comprising administering to said mammal an immunologically 
effective amount of a Duffy Antigen Binding Protein (DABP) binding domain polypeptide, a Sialic Acid Binding Protein 
(SABP) binding domain polypeptide, or a combination thereof. 
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FIG. 3 

UNIEBP5 and 5A: P R R Q K/E L C 

UNIEBP5. for A+T biased codon usage- 
CC(A/G).AG(G/A>-AG(G/A)-CAA-{.G/A)AA.(C/DTA-TG 

UNIEBP5A, for G+C biased cadon usage- 
CC(C/G).CC/A)G(C/G>.(C/A)G(C/G).CAG-CAG.(CniT(C/G)-TG 

UNIEBP5 B and C; F A D lA' G/R D I 

UNIEBP5B. for A+T biased codon usage- 
TTT.GC(An>GAT-(An)(An)(AnXG/C)G(An).GAT.AT 

UNIEBP5C. for G+C biased codon usage- 
■n-C.GC(G/C)-GAT-(ArO(A^C-<G/C)G(G/C)-GAC-AT 

UNIEBP3 and 3A: P Q. F L/F R W 

UNIEBP3. forA+T biased codon usage- 
CCA-{A/r)CCT/G)-CT/G)A(A/G)-(A/G)AA-TTG.(AT)GG 

UNIEBP3A, for G+C biased codon usage- 
CCAKC/G)C(G/T)-G{AT)A-GA(An>CTG-(C/G)GG 

UNIEBP3 B and C: E W G D/E D/E Y/F C 

UNIEBP3B. forA+T biased codon usage- 
CA-A{An)AKAmTC-{An)TC-(An)CC-CCA-TTC . 

UNIEBP3C. for G+C biased codon usage- 
CA^(An)AKG/C)TC-(G/C)TC-(G/C)CC-CCA-CTC G+C Biased 
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